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ABSTRACT 


This  thesis  systematically  and  comprehensively  analyzes 
available  personnel  data  to  determine  if  a  significant 
relationship  exists  between  measures  of  intelligence  and 
academic  performance,  and  career  promotion  rate  for 
Noncommissioned  Officers.  Forty  thousand  Noncommissioned 
Officer  (NCO)  records  were  analyzed  to  determine  this,  using 
three  approaches . 

The  first  approach  was  a  sequential  procedure  which 
progressed  from  analysis  of  individual  variables  through 
multivariate  regression  models.  The  second  approach  focused 
on  analysis  of  NCO's  who  scored  in  the  top  three  percent  of 
promotion  rate.  The  third  approach  used  more  advanced 
statistical  techniques,  including  the  use  of  principal 
components  and  factor  analysis,  to  better  identify  the  most 
influential  explanatory  variables. 

During  the  analysis,  eight  measures  of  intelligence  and 
academic  ability  were  used  as  explanatory  variables.  Four 
control  variables  were  included  in  the  analysis  to 
discriminate  between  subcategories  of  NCO's.  They  were: 
sex,  career  field,  race,  and  paygrade. 

Throughout  the  analysis  consideration  of  Army  promotion 
and  accession  policy  was  included.  Knowledge  of  these 
policies  resulted  in  elimination  of  some  special  groups  which 
had  received  promotions  under  significantly  different 
conditions  than  the  rest  of  the  sample.  An  example  of  this 
was  Reserve  and  National  Guard  members  called  to  active  duty. 

This  study  found  that  there  was  significant  statistical 
evidence  to  show  that  a  high  level  of  Armed  Forces 
Qualification  Test  (AFQT)  score  and  prior  service  academic 
accomplishment  will  correspond  to  a  higher  promotion  rate. 
Also,  in-service  measures  of  NCO  education  and  performance 
testing  were  good  indicators  of  promotion  rate. 

However,  there  was  significant  variance  associated  with 
the  explanatory  relationship.  As  a  result,  a  useful 
predictive  model  could  not  be  designed  using  regression 
methods .  Although  the  model  could  predict  promotion  averages 
for  major  population  subcategories,  it  was  unreliable  when 
used  solely  with  the  AFQT  variable. 

The  findings  of  this  study  suggest  two  policy 
recommendations.  The  first  recommendation  was  a  confirmation 
of  the  constraints  placed  on  AFQT  category  and  high  school 
diploma  status  by  the  1984  Defense  Authorizations  Act.  The 
second  recommendation  was  to  require  promotion  boards  to 
consider  NCO  schooling  level  and  performance  test  scores  in 
their  procedings,  but  to  avoid  directly  tying  either  score  to 
promotion,  in  terms  of  a  minimum  quota  or  scaled  promotion 
point  scale. 

Finally,  a  suggestion  was  given  for  further  research  to 
investigate  the  underlying  reasons  for  different  attrition 
patterns  observed  among  racial  and  ethnic  groups. 
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I  .   INTRODUCTION 

A.   BACKGROUND 

In  almost  any  organization,  one  hopes  that  individuals  at 
high  levels  of  authority  are  gifted  with  higher  than  average 
intelligence.  Correspondingly,  one  would  think  that,  given 
equal  work:  effort,  a  more  intelligent  person  will  advance 
more  rapidly  than  his  contemporaries  in  an  organization. 

It  is  not  difficult,  however,  to  find  examples  which 
contradict  our  perceptions  of  the  role  of  intelligence  in 
career  advancement.  In  almost  any  field  one  can  remember  an 
individual  who  was  not  the  most  intellectually  gifted,  but 
through  hard  work  and  persistence,  or  other  less  quantifiable 
traits,  advanced  equally  or  better  than  persons  of  higher 
measured  mental  ability.  There  is  ample  room  for  other 
influences  to  overwhelm  the  value  of  a  person's  intelligence 
in  the  eyes  of  a  superior.  An  unattractive  personality,  an 
inability  to  apply  that  intelligence  to  the  tasks  at  hand, 
and  a  myriad  of  other  flaws  can  discredit  the  merit  of  raw 
intelligence . 

The  degree  at  which  intelligence  impacts  on  advancement 
lies  in  the  area  of  complex  interaction  between  individuals 
and  organizations.  It  carries  with  it  much  of  the 
uncertainty  of  quantification  of  human  performance. 

Despite  ample  room  for  exceptions,  the  concept  of  a 
general   reward   for   being   more   intelligent    still   seems 
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reasonable.  It  may  be,  however,  that  to  clearly  see  its 
manifestation  requires  looking  at  a  large  number  of  people 
who  have  been  affected  by  as  similar  a  set  of  opportunities 
for  advancement  as  possible.  It  is  the  task  of  this  thesis 
to  investigate  this  relationship  within  a  fairly  restricted, 
but  numerically  large  population.  The  population  is  one 
which  has  had  fundamental  raw  statistics  uniformly  obtained/ 
and  where  policies  to  promote  personnel  are  unambiguous  and 
well  documented. 

B.  PURPOSE 

The  purpose  of  this  thesis  is  to  answer  a  central 
question:  Does  a  significant  relationship  exist  between 
measures  of  intelligence  and  academic  ability,  and  an 
individual's  promotion  rate  as  a  Noncommissioned  Officer? 
Put  more  simply,  does  being  smarter,  as  measured  by  initial 
test  scores,  or  being  better  schooled,  indicate  that  a  person 
will  perform  better  and,  hence,  advance  more  quickly  than  his 
peers? 

The  answer  to  this  question  has  important  implications 
for  Army  policies  of  recruitment,  retention,  and  promotion. 
It  is  also  a  matter  of  general  interest  to  social  scientists. 

C.  ORGANIZATION 

This  thesis  is  organized  fundamentally  as  a  data  analysis 
investigation.  Chapters  I  and  II  provide  preliminary 
information  on  the  nature  of  the  study  variables,  and  briefly 
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review  some  related  articles  which  have  addressed  this  topic. 
The  remaining  chapters  discuss  the  analysis  of  approximately 
forty-thousand  Noncommissioned  Officer  (NCO)  records  using 
three  related  approaches.  The  first  approach  is  a  fairly 
standard  procedure  of  experimental  data  analysis.  This 
procedure  begins  with  analysis  of  fundamental  attributes  of 
individual  variables,  then  advances  through  successive 
increases  in  dimensionality  and  complexity.  The  second 
approach  views  a  subset  of  the  population  which  distinguishes 
itself  by  being  in  the  top  three  percent  of  the  NCO  promotion 
rates.  Comparison  of  these  top  performers  to  the  remainder 
of  the  population  identifies  attributes  which  are  found  to  be 
significantly  different,  and  hence,  are  possibly  an 
associated  cause  for  rapid  advancement.  In  the  third 
approach,  the  statistical  methods  of  principal  components  and 
factor  analysis  are  used  to  provide  an  alternative  method  of 
critical  variable  selection,  as  well  as  to  lend  credibility 
to  the  results  of  the  other  two  approaches. 

D.   PRELIMINARY  INFORMATION 

This  section  contains  an  initial  discussion  about  the 
nature  of  the  data,  a  general  overview  of  the  Army  NCO 
promotion  system,  and  a  synopsis  of  the  analytical  tools  used 
in  this  thesis.  As  previously  mentioned,  there  is  a  degree 
of  looseness  in  the  effectiveness  of  measurement  for 
intelligence  and  academic  data,  and  also  some  confounding 
phenomena  in  Army  promotion   policy.     Early   recognition  of 
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these  problems  should  set  the  degree  of  caution  which  is 
needed  in  reviewing  the  subsequent  chapters  of  analysis.  The 
section  on  analytical  tools  is  intended  to  inform  the  reader 
of  the  conditions  under  which  the  data  analysis  was 
conducted,  and  the  hardware  and  software  used. 

1 .   Intelligence  Test  Scores 
a.   General 

The  data  for  intelligence  test  scores  falls  into 
the  category  sometimes  referred  to  as  Defined  Measurement.  A 
Defined  Measurement  is  one  where  the  property  being 
considered  cannot  be  measured  directly . CRef .  1  :p.  6]  As  a 
result,  a  related  measure  is  substituted  for  measurement  of 
the  actual  property.  In  this  case,  the  property  is 
intelligence,  and  the  presumed  related  measurements  are  test 
scores  from  a  particular  battery  of  tests. 

The  efficacy  of  intelligence  tests  as  a  representative 
measure  for  intellectual  ability  is  itself  an  issue 
surrounded  by  controversy.  This  controversy  has  been  the 
topic  of  entire  books  and  studies.  The  testing  done  by  the 
Army  is  the  Armed  Forces  Vocational  Aptitude  Battery,  or 
ASVAB.  Although  not  designed  specifically  as  an  intelligence 
test,  the  ASVAB  does  predict  general  trainability . 
Additional  research  has  shown  that  the  mathematical  and 
verbal  portions  of  the  ASVAB  have  a  high  correlation  to  the 
ACT,  PSAT,  and  SAT  college  entrance  examinations . C Ref .  2] 
The  ASVAB  has  been  studied,  improved,  and  used  for  over  forty 
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years.     A   recent   article   by   Jensen   [Ref   3:p.   35],  in 

Measurement   and   Evaluation   in   Counseling  and  Development, 

states : 

"To  the  degree  that  success  in  various  occupations  and 
training  programs  requires  different  levels  of  general 
ability  (often  called  intelligence  or  IQ),  an  ASVAB 
composite  (it  hardly  matters  which  one)  will  be  as 
validly  predictive  as  any  test  now  on  the  market.  .  .  It 
seems  that  the  new  ASVAB-14  is  near  the  limit  of 
refinement,  psychometrically . " 

Generally  then,  the  ASVAB  is  a  well  documented  and 
established  aptitude  test.  Although  the  military  does  not 
specifically  attempt  to  determine  the  intelligence  of  its 
potential  candidates,  academic  portions  of  the  ASVAB  test 
have  shown  themselves  to  be  reasonably  defined  measurements 
of  intelligence. 

b.   Specific  Tests. 

The  ASVAB  consists  of  a  battery  of  ten  subtests. 
Composites  of  the  subtests  of  the  ASVAB  are  used  to  determine 
the  overall  acceptability  of  an  individual  requesting 
enlistment,  and  for  which  field  he  or  she  would  best  be 
suited.  From  the  entire  battery  of  tests,  two  derived  scores 
of  intelligence  are  taken  as  aggregate  measures  of 
intelligence.  The  first  is  the  GT,  or  general  intelligence 
score.  This  score  is  the  aggregation  of  three  submodules, 
the  word  knowledge^  paragraph  comprehension/  and  arithmetic 
reasoning.  The  second  derived  measure  of  intelligence  is  the 
Armed  Forces  Qualification  Test  Score,  or  AFQT .  This  score 
considers    four    submodules,    word    knowledge,   paragraph 
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comprehension,  arithmetic  reasoning  and  numerical 
operations . [Ref .  10:sec  1-0,  p.  1]  An  AFQT  score  is 
reported  as  a  percentile  score  representing  the  examinee's 
relative  standing  in  reference  to  a  specific  population. 

There  has  recently  been  some  additional  manipulation  of 
the  AFQT  score.  In  October  of  1984,  the  reference  population 
for  assignment  of  an  individual's  AFQT  percentile  was  shifted 
from  a  base  reference  population  of  1944  to  that  of  1980.  A 
base  reference  population  is  a  set  of  values  designed  to 
represent  how  the  raw  AFQT  scores  of  the  entire  American 
youth  population  would  be  distributed.  This  set  of  values 
was  originally  designed  in  1944,  and  had  not  been  updated 
until  1980.  This  thesis  utilized  the  1980  base  AFQT 
percentiles.  A  transformation  of  test  percentiles  for 
soldiers  who  enlisted  prior  to  1980  was  effected  by  the 
Defense  Manpower  Data  Center  (DMDC),  and  all  subsequent 
Department  of  the  Army  records  have  been  computed  based  on 
the  1980  reference.  A  listing  for  AFQT  percentile 
transformations  can  be  found  in  APPENDIX  A. 

GT  scores,  which  are  expressed  as  the  sum  of  the  raw 
test  scores,  have  not  been  manipulated.  However,  unlike  the 
the  case  with  AFQT  score,  soldiers  have  been  allowed  to 
retake  their  tests  to  increase  their  original  GT  scores. 
Retesting  was  introduced  in  1982  when  a  minimum  GT  score  of 
120  was  enforced  on  eligibility  for  promotion  to  NCO  rank. 
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2 .  Academic  Scores 

a.  General 

The  data  used  for  academic  ability  is  also  a 
defined  measurement,  similar  to  the  measures  for 
intelligence.  Specifically,  the  property  of  academic  ability 
is  being  represented  by  a  simple  assignment  of  the  number  of 
years  This   value   is   independent  of  the  quality  of 

education,  and  the  grades  that  any  given  individual  may  have 
received.  This  study  assumes  that  continued  attendance  and 
progression  through  the  educational  system  is  inherently 
indicative  of  academic  ability.  For  example,  a  high  school 
graduate  has  more  academic  ability  than  an  individual  with  an 
eighth  grade  education.  The  informational  value  of  academic 
scores  is  thus,  not  as  useful  as  desired.  It  is  treated  in 
analysis  as  only  an  ordinal  scaled  variable. 

b.  Specific 

Three  academic  scores  are  used  in  the  study: 
present  education  level,  education  level  upon  entry  into 
Army,  and  military  education  since  entry.  Because  advanced 
professional  schooling  is  made  available  only  to  those 
individuals  who  have  superior  service  records,  the  military 
education  score  carries  with  it  some  additional  information 
relative  to  the  performance  of  the  NCO. 

3 .  Promotion  Scores 

Promotion  within  the  Army  is  a  closely  supervised  and 
somewhat  complicated  procedure.     It   is   the   product   of  a 
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considerable  number  of  policies  which  are  not  uniformly 
applied  across  the  population.  Instead,  they  are  applied 
within  rank  structure,  within  career  field,  or  even  as  a 
function  of  years  of  education.  Thus,  although  the 
computation  of  an  individual's  promotion  rate  is  an  easy 
task,  that  value  may  have  been  influenced  by  several  policies 
that  were  peculiar  to  the  individual, 
a.   General 

Promotion  of  NCO's  is  governed  by  Army  Regulatic 
AR  600-200.  This  regulation  establishes  requirements  for 
eligibility,  and  outlines  the  process  of  selection.  The 
system  views  the  individual's  performance  as  a  whole.  This 
includes  a  composite  score  based  on  performance  scores, 
commander's  ratings,  service  awards,  and  review  by  a  board  of 
senior  NCO's.  This  composite  point  value  is  used  as  a 
threshold  value  for  the  Department  of  the  Army  to  use  when 
promoting  individuals  to  the  next  higher  paygrade,  as  slots 
become  available.  The  slots  are  accounted  for  by  career 
management  field,  and  as  such,  the  minimum  threshold  for  a 
combat  soldier  to  be  promoted  may  be  different  than  that  of  a 
support  soldier.  A  general  observation  is  that  career  fields 
with  more  technical  orientation  have  higher  promotion  point 
thresholds,  and  subsequently,  longer  times  to  advancement 
than  those  in  the  larger  and  less  technically  oriented  career 
fields . 

AR  600-200   also  sets  minimum  times  of  service  and  grade 
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which  an  individual  must  have  served  to  be  considered  for 
promotion.  Unless  superceded  by  a  special  policy,  the 
shortest  period  for  promotion  to  E-5  is  two  years,  and  is 
four  years  to  E-6.  This  rate  includes  waivers  for  both  time 
in  service  and  time  in  grade.  Promotion  to  E-6  in  four  years 
requires  that  the  individual  be  advanced  to  E-5  in  two  years. 

b.   Specific 

Because  of  the  lack  of  uniformity  of  promotion 
within  the  army  population,  in  this  thesis  we  have  taken 
considerable  care  to  identify  and  address  discontinuities 
which  would  confound  promotion  based  on  merit.  This  includes 
the  elimination  of  some  data,  and  the  computation  of  three 
different  promotion  rate  scores.  The  governing  principle  for 
manipulation  or  restriction  of  data  was  to  produce  a  sample 
population  in  which  each  individual  started  from  the  same 
point  in  the  rank  structure,  and  had  equal  opportunity  for 
advancement  by  merit.  Chapter  III,  Overview  of  the  Data, 
discusses  in  detail  the  identified  problems  and  what 
corrective  action  was  taken. 
4 .   Analytical  Tools  Used 

This   section   briefly   identifies   the   hardware  and 
software  used  in  analysis. 

a.   Hardware 

Computational  resources  used  for  analysis 
included  an  IBM  3033  System  370  mainframe  computer  running 
MVS  batch  system.   Additionally,  analysis  was  done   for  small 
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data  sets  using  a  standard  IBM  microcomputer. 
b.   Software 

Two  software  packages  were  used  for  the  majority 
of  the  data  analysis.  SAS  Version  5  was  used  predominantly 
for  analysis  resulting  in  tabular  output,  such  as  principal 
components  and  factor  analysis .[ Ref .  4,5]  Graf stat-  an 
unreleased  IBM  mainframe  data  analysis  and  plotting  program, 
was  utilized  for  analysis  requiring  graphical  output  and  for 
confirmation  of  SAS  tabular  results . [Ref .  6,7] 

E.   SUMMARY 

The  objective  of  this  introduction  has  been  to  adequately 
frame  the  scope  of  the  topic,  and  to  present  sufficient 
background  to  the  reader  so  that  he  or  she  is  alerted  to  some 
of  the  difficulties  inherent  in  a  topic  of  this  nature. 
Also,  this  will  establish  a  reference  for  some  of  the  tools 
used  to  conduct  the  analysis. 

The  length  of  this  section  is  indicative  of  the  degree  of 
preparation  required  to  analyze  a  relationship  which  has 
significant  complications  in  both  dependent  and  independent 
variables.  Although  the  list  of  assumptions  and  the 
stripping  of  aberrant  data  makes  one  cautious  about  the 
reality  of  such  a  study,  each  event  should  be  considered  on 
its  ability  to  uncover  the  answer  to  the  central  question  of 
this  thesis.  The  central  question  again  is,  whether  or  not  a 
significant  relationship  exists  between  measures  of 
intelligence   and    academic   ability,   and   an   individual's 
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promotion  rate  as  a  Noncommissioned  Officer.  It  is  important 
to  learn  whether  measures  of  intelligence  and  academic 
ability  are  important  indicators  of  promotion  in  the  army, 
and  if  so,  how  strong  that  relationship  is.  If  sufficiently 
reliable  and  believable  relationships  can  be  determined,  then 
policies  could  be  designed  to  better  identify  and  develop 
capable  individuals  for  positions  of  leadership. 

The  analysis  of  this  thesis  reduced  the  effects  of 
confounding  policies,  such  as  discriminatory  promotion  and 
accession  programs.  It  also  used  a  sufficiently  large  sample 
size,  which  allowed  the  averages  to  outweigh  the  exceptions. 
It  drew  on  data  from  standard  personnel  records,  and  made  the 
most  effective  use  of  that  information. 
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II.   A  REVIEW  OF  PREVIOUS  STUDIES 

The  topic  of  relating  intelligence  to  some  aspect  of 
performance  is  an  extensive  and  rich  area  of  study.  It  is  a 
particular  topic  of  interest  to  social  scientists  and 
military  manpower  specialists.  As  a  demonstration  of  the 
quantity  of  work  done  in  this  area,  a  simple  cross- 
referencing  of  the  words  intelligence  test  and  performance 
produced  a  list  of  237  citations  from  the  Lockheed's  DIALOG 
online  information  files.  Restriction  of  available 
references  to  those  utilizing  military  intelligence  test 
scores  and  statistical  analysis  of  those  tests  relative  to 
some  performance  measure  still  results  in  a  large  number  of 
citations.  Within  this  restriction  there  is  a  variety  of 
study  methodologies.  The  source  of  a  study  can  originate 
from  an  in-house  military  analysis,  a  contracted  study  done 
by  a  commercial  analytical  institute,  or  an  academic 
institution  making  use  of  military  data  as  its  media  for 
analysis . 

The  nature  of  the  data  is  also  varied.  Several  studies 
readministered  the  ASVAB  tests  to  a  selected  test  population, 
other  studies  used  IQ  and  other  intelligence  measures  in 
addition  to  the  ASVAB.  The  performance  side  of  the 
relationship  had  an  extensive  number  of  dependent  variables. 
Examples  of   performance  measures   were:    results  of  written 
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examS/   military   skills   test  results,  minority  advancement, 
and  comparison  to  collegiate  ACT,  PSAT,  and  SAT  tests. 

This  chapter  will  review  four  of  the  most  closely 
related  studies,  concentrating  for  each  one  on: 

1.  The  objective  of  the  study. 

2.  The  methodology  used  in  analysis. 

3.  The  conclusion  reached. 

The  first  analysis  is  from  Are  Smart  Tankers  Better? 
AFQT  and  Military  Productivity .[ Ref .  8]  This  study  is 
essentially  an  in-house  military  analysis,  the  authors  being 
Army  officers  assigned  to  the  Office  of  Economic  and  Manpower 
Analysis,  at  West  Point,  New  York.  As  described  in  the 
title,  the  paper  presents  the  results  of  an  investigation  in 
which  the  crews  of  tanks  were  scored  on  their  ability  to 
destroy  targets  on  live  fire  ranges.  The  AFQT  score  of  the 
gunner  and  tank  commander  was  one  of  several  explanatory 
variables,  having  the  tank  scores  as  the  dependent  variable. 
The  analysis  methodology  used  a  log-log  production  model  with 
ordinary  least  squares  regression. 

The  result  of  their  analysis  is  best  summarized   in  this 

paragraph  from  the  study: 

"That  there  exists  a  positive,  statistically 
significant  relationship  between  AFQT  and  performance,  is 
a  powerful  result.  The  coefficients  on  the  model  means 
that  if  we  move,  for  example,  from  the  AFQT  score  for  an 
average  Category  IV  TC  to  the  AFQT  score  for  an  average 
Category  IIIA  TC ,  (a  200%  increase),  we  will  increase  the 
performance  on  Table  8  (the  tank  scoring  exercise)  by 
approximately  20.3%." 
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In  this  study  then,  AFQT  was  found,  by  means  of  least  squares 
regression,  to  have  a  definitive  relationship  to  a  well- 
defined  skill  measure,  the  conduct  of  tank  firing. 

The  second  study  is  an  analysis  done  at  the  University  of 
Iowa   by   the   Cada   Research   Group   titled:    On  Predicting 

Success  in  Training  for   Males   and   Females; Marine  Corps 

Clerical  Specialties  and  ASVAB  Forms  6  and  7.[Ref  9]  This 
report  uses  the  ASVAB  score  as  an  explanatory  variable  for 
success  of  recruits  in  training.  The  methodology  used  is 
primarily  regression;  however,  the  scope  of  the  regression 
concentrates  on  identifying  differences  between  male  and 
female  performance.  The  implicit  result  in  the  study's 
discussion  of  the  sex  score  differences  is  that  the 
regressions  performed  for  each  category  was  of  useful 
predictive  value.  An  interesting  note  about  this  study  was 
that  the  inclusion  of  high  school  completion  reduces  the 
difference  between  the  male  and  female  regression 
coefficients . 

The  third  study  is  a  section  of  articles  used  in  the 
Report  to  the  House  and  Senate  Committess  on  Armed  Services, 
Defense  Manpower  Quality,  Volume  II,  Army  Submission. 
[Ref .  10]  The  section  of  interest  to  this  thesis  was  a  study 
done  by  the  U.  S.  Army  Training  and  Doctrine  Command  (TRADOC) 
Systems  Analysis  Activity  (TRASANA).  The  study  uses  AFQT,  as 
well  as  education  level,  sex,  paygrade,  time  in  service,  time 
in   Military   Occupational   Specialty   ( MOS ) ,   and    a   dummy 
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variable  reflecting  General  Equivalency  Diploma  (GED) 
completion  as  explanatory  variables.  GED  is  a  rating  given 
to  individuals  who  did  not  graduate  from  high  school,  but  who 
have  taken  examinations  to  be  rated  as  equivalent  to  a  high 
school  graduate.  A  battery  of  tests  given  under  controlled 
conditions  resulted  in  a  net  score  which  was  made  the 
dependent  variable.  The  battery  of  tests  was  designed  so  as 
to  represent  how  proficient  a  soldier  was  in  his  specific 
career  field.  The  test  included  a  written,  as  well  as  hands- 
on  proficiency  test. 

The  analysis  method  used  was  linear  regression,  with  the 
inclusion  of  a  Durbin  Instrument  as  a  correction  tool  for 
AFQT.   The  results  are  again  best  summarized  from  the  report: 

"The  most  important  result  is  that  AFQT  Category  I-IIIA 
soldiers  performed  approximately  10%  better  overall  than 
IIIB  soldiers.  .  .  Furthermore,  AFQT  was  a  much  more 
important  influence  on  performance  in  virtually  all 
instances  than  either  education  or  experience,  whether 
measured  in  terms  of  time  in  service,  MOS,  or  unit. 
Thus,  these  results  strongly  support  the  validity  of  AFQT 
as  a  predictor  of  performance  in  these  military 
occupational  specialties." 

This  report  then,  is  very  similar  in  conclusion  to  the 
tank  gunnery  report,  in  which  AFQT  was  shown  through 
regression  to  have  a  significant  and  measurable  effect  on 
soldier  performance  in  skill  related  tasks. 

The  last  study  reviewed  is  also  from  the  collection  found 
in  the  Defense  Manpower  Study.  [Ref.  11]  The  topic  for  this 
study  was  the  estimation  of  promotion  rate.  It  is  presently 
the  most  similar  study  to  the   central  theme   of  this  thesis. 
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Using  AFQT  as  one  of  the  independent  variables,  a  duration 
model  is  applied  to  estimate  the  expected  speed  of  promotion. 
This  model  was  applied  within  two  categories,  the  paygrade 
and  the  career  field  of  the  NCOs .  This  promotion  estimation 
study  approaches  the  aggregation  of  data  in  a  different 
manner  as  well.  Specifically,  by  evaluating  the  possibility 
of  promotion  for  each  individual  over  a  series  of  years,  the 
dimension  of  time  was  entered  into  analysis.  A  significant 
advantage  of  including  the  time  dimension  was  that  changes  in 
the  categorical  levels  of  the  population  could  be  accounted 
for,  such  as  race  or  sex. 

The  methodology  used  in  the  promotion  estimation  study  is 
considerably  more  complex  than  in  the  previous  studies. 
Rather  than  using  standard  regression  models,  the  study  uses 
the  Generalized  Linear  Model  form.  Specifically,  the  form  of 
the  predictive  model  is  a  log  likelihood  function  using  the 
Weibull  shape  parameter.  The  explanatory  variables  include 
education,  AFQT,  marital  status,  race,  number  of  dependants, 
time  in  service,  sex,  and  high  school  completion  status.  By 
using  the  Weibull  model,  the  application  of  explanatory 
variables  which  are  not  continuous,  such  as  sex,  high  school 
completion  status,  and  marital  status  is  more  proper. 
Additionally,  there  are  no  requirements  for  the  normality 
assumptions  for  the  residuals,  and  therefore,  less 
subjectivity  to  the  appropriateness  of  the  model  with  respect 
to  the  independent  variables.   This  method,  however,  does  not 
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consider  any   in-service  information   and  was  calculated  only 

for  very  specific  CMF  and  Paygrade  combinations.   The  results 

are  summarized  as  follows: 

"A  review  of  these  promotion  results  reveals  two 
trends.  First,  even  after  controlling  for  high  school 
diploma  status,  AFQT  Category  I-IIIA  soldiers  are 
promoted  approximately  10%  more  rapidly  than  1 1  IB 
soldiers.  Second,  high  school  completion  is  less 
important  than  AFQT  score  in  determining  promotion  rates. 
The  remarkable  aspect  of  this  last  result  is  that 
educational  attainment  is  an  explicit  part  of  the  Army's 
promotion  point  system,  while  AFQT  scores  are  not.  These 
trends  are  true  for  both  promotion  to  E-5  and  promotion 
to  E-e." 

As  considerable  attention  has  already  been   given  to  the 

topic   of   relating   measures  of  intelligence  to  performance, 

and  since   positive  results   have  generally   been  the  result, 

one   might   wonder   why   another   study  should  be  undertaken. 

First,  this  thesis  is  in  response  to  a  request  by   the  Office 

of   the   Deputy   Chief   of   Staff  for  Personnel  (ODCSPER)  for 

further  research  in  the   relationship  of   AFQT  to   success  in 

the  Army.    Secondly,   this  thesis   will  be   different  in  its 

approach  and  analytical  procedures.   Following   is  a   list  of 

the  unique  characteristics  of  this  thesis: 

1.  The  perspective  of  this  thesis  is  that  the  results  will 
be  used  as  a  management  tool,  or  as  an  explanatory 
method  for  active  duty  Army  personnel.  In  that  light, 
the  study  utilizes  information  collected  from  the 
individual's  in-service  record,  such  as  his  Skill 
Qualification  Scores,  and  his  NCO  Schooling  levels. 
Similar  to  accession  related  studies,  this  analysis 
includes  intelligence,  academic,  and  categorical 
information  as  potential  explanatory  variables. 
However,  the  intent  is  not  to  justify  accession  of  high 
quality  soldiers,  but  to  investigate  the  trends  of 
promotion  for  active  duty  personnel  as  a  function  of 
available  personnel  data. 
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This  study  conducts  significant  investigation 

into  the   data  to   identify  and  correct  anomalies  which 

would  confound  the  relationship  in  question. 

Statistical  analysis  is  done  from  the  bottom  up, 
rather  than  by  direct  movement  into  regression  models. 
This  approach  finds  that  strict  parametric  models  are 
subject  to  error  due  to  the  inability  of  some  data 
variables  to  meet  distributional  assumptions  necessary 
for  parametric  analysis.  The  study  then  moves  to 
nonparametr ic  means  to  approach  the  issue. 

For  regression  models,  given  the  cautions  on  their  use, 
an  additional  sample  population  is  tested  using  the 
model.  Thus,  the  results  from  the  initial  model  can  be 
considered  to  have  more  believability  and  fidelity  than 
a  model  based  on  analysis  of  a  single  population 
sample . 

The  use  of  a  large  data  set.*- 

Several  explanatory  variables  have  been  made 
available  from  the  DMDC  data  base   which  have   not  been 
used   in   previous   studies.    They  include  the  initial 
education  at  time  of  entry,  NCO  education  level,   and  a 
race  variable  with  six  categories. 
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This  study  uses  graphical  methods  for  depiction  of  many 
of  the  methods  of  analysis. 


♦Study  number  four  from  Defense  Manower  Study  uses  both 
large  data  sets  and  promotion  as  an  independent  variable. 
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Ill .   OVERVIEW  OF  THE  DATA 

A.   INTRODUCTION 

A  critical  aspect  of  this  thesis  was  the  selection  and 
screening  of  data.  Two  general  guidelines  were  applied  in 
creating  the  data  set.  First,  the  data  set  had  to 
demonstrate  a  level  of  homogeneity  in  that  the  NCO's 
considered  would  all  have  served  under  similar  enlistment  and 
advancement  policies.  Secondly,  the  selection  of  individual 
records  needed  to  be  random  and  without  unintentional  bias  to 
meet  the  requirements  for  a  representative  sample  set. 
Section  III  C.  describes  in  detail  the  measures  taken  to 
insure  that  the  above  two  attributes  were  established  in  the 
study  data  set. 

Receding  of  data  values  into  numerical  equivalents  was 
required  for  several  personnel  record  fields.  As  an  example, 
the  level  of  Military  Schooling,  which  is  the  NCO's  in- 
service  schooling  level,  was  recorded  as  mixed  alpha-numeric 
characters.  Transformation  involved  rank  ordering  the 
available  levels  of  schooling  in  ascending  hierarchical  order 
and  substituting  a  numeric  value  for  the  alpha-numeric  value. 
Chapter  IV  discusses  in  detail  the  background  of  each 
variable.  Finally,  as  a  check  on  the  effects  of  manipulating 
and  restricting  the  sample  data  set,  section  III  D.  provided 
a  comparison  of  statistics  for  the  entire  U.S.  Army  NCO 
database,  versus  the  sample  data  set  used  in  this  thesis. 
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B.    DESCRIPTION  OF  THE  VARIABLES 

The  data  variables  used  in  this  study  fall  into  three 
categories:  control  variables,  intelligence  variables,  and 
promotion  variables.  The  first  two  categories,  control  and 
intelligence,  were  used  as  explanatory  variables,  while  the 
promotion  variables  were  used  as  the  dependent  variables .  A 
brief  description  of  each  variable  is   tabulated  in   Table  I . 


TABLE  I 

Summary  of  Variables  in 

Sample 

Variable   Category 

Meaning 

Value 

Scale 

Dependent 

PRATE 

Promotion 

Raw  Promotion  Rate: 
number  of  promotions 

per  month  to  most 

041-.21   Ratioj 

recent  promotion 

RATE 

Promotion 

Promotion  rate  difference 
from  average  for  that 

paygrade  (normalized) 

2.2-9. 

4   Ratio 

PRA 

Promotion 

Promotion  rate  difference 
from  average  for  that 

paygrade  and  CMF 

3.4-8. 

0   Ratio 

( normalized) 

Explanatory 

SEX 

Control 

Male/Female 

0/1 

Nominal 

CMF 

Control 

Career  Management  Field 

11-99 

Nominal 

RACETH 

Control 

Race/Ethnic  group 

1-5 

Nominal 

PAYGD 

Control 

Paygrade 

5-7 

Ordinal 

GTSCR 

Intell 

General  Intelligence 

Score 

0-160 

Ordinal 

AFQTP 

Intell 

Armed  Forces 

Qualification  Test  Score 

1-100 

Ordinal 

Percentile 

OAFQTP 

Intell 

Same  as  AFQTP,  referenced 

on  1980  population 

1-100 

Ordinal 

EIMCAT 

Intell 

Mental  Category;  based 
on  OAFQTP 

1-8 

Ordinal 

HIYRED 

Intell 

Highest  Year  of  Education 

upon  entry  into  Army 

1-12 

Ordinal 

EDLVL 

Intell 

Present  Education  Level 

1-12 

Ordinal 

NCOE 

Intell 

Military  Education  Level 

Attained 

0-13 

Ordinal 

PQSCR 

Intell 

Army  Proficiency  Test 

0-100 

Ratio 
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A  more  detailed  description  of  each  of  the  study 
variables  will  be  given  in  the  first  part  of  Chapter  IV, 
Successive  Analysis. 


C.   PREPARATION  OF  THE  DATA 

Preparation  of  the  data  began  with  acquiring  fifty 
thousand  records  from  the  U.S.  Army  Military  Personnel  Center 
in  Alexandria,  Virginia.  Initial  restrictions  on  the  data 
were  established  to  allow  inclusion  of  only  NCO's  with  a  date 
of  entry  after  January  1,  1976.  Further,  NCO's  selected  had 
to  be  members  of  the  Regular  Army,  and  not  Reserve  or 
National  Guard  forces.  These  restrictions  provided  for 
observation  of  only  those  NCO's  who  were  recruited  a 
reasonable  time  period  following  the  ending  of  the  Viet  Nam 
War,  and  following  the  establishment  of  the  All-Volunteer 
Force.  Restricting  the  NCO's  to  Regular  Army  soldiers 
focused  the  study  on  the  standing  forces  alone,  and  avoided 
confounding  as  a  result  of  different  promotion  and  accession 
policies  in  the  Reserve  and  Guard  Forces. 

The  records  requested  were  randomly  drawn  by  taking  every 
fifth  individual  from  an  estimated  population  of  250,000 
meeting  the  above  restrictions.  The  fifty  thousand  MILPERCEN 
records  were  then  matched  and  merged  with  a  similar  personnel 
database  from  the  Defense  Management  Data  Center  (DMDC) 
Monterey,  California.  The  DMDC  database  holds  additional 
information,  including:  the  ability  to  distinguish  high 
school  equivalent  certificates  holders  from  actual  graduates, 
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the  highest  year  of  education  of  the  soldier  at  time  of 
enlistment,  and  AFQTP  and  EIMCAT  scores  renormed  for  a  1980 
population . 

After  the  raerging,  data  records  which  had  missing  values 
in  any  of  the  critical  variables  fields  were  dropped.  There 
were  approximately  ten  thousand  records  missing  critical 
data.  Following  initial  analysis  of  promotion  rates,  two 
additional  restrictions  were  applied  against  the  remaining 
records . 

First,  a  grouping  of  several  hundred  promotion  rates 
showed  that  individuals  had  been  promoted  to  the  rank  of  E-5 
at  rates  which  were  as  high  as  one  promotion  per  month. 
Cross  referencing  of  service  numbers  identified  this  sub- 
group as  NCO's  who  had  served  in  Reserve  or  Guard  units  and 
who,  for  a  variety  of  reasons,  had  been  called  for  active 
duty.  As  such,  they  were  allowed  by  regulation  to  carry  with 
them  an  accelerated  promotion  to  their  former  rank. 
Subsequently,  a  serial  number  match  and  elimination  was  done 
for  all  NCO's  with  recent  listing  as  Reserve  or  Guard  status. 

A  second  source  of  unusual  promotion  rates  at  the  E-5 
level  became  apparent  in  some  of  the  more  technically 
oriented  career  management  fields,  the  medical  field  in 
particular.  Research  into  Army  special  recruitment  policy 
indicated  that  during  the  early  1980's  special  provisions 
were  made  to  allow  persons  with  background  ability  in  certain 
technical  fields   to  enter   the  Army   and  be   promoted  to  NCO 
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status  within  six  months,  or  in  certain  cases  to  receive  NCO 
status  immediately  following  basic  training.^  To  correct  for 
these  anomalies,  all  promotion  rates  which  fell  outside  the 
maximum  time  periods  considering  application  of  both  waivers 
were  discarded. 

D.   COMPARISON  TO  TOTAL  ARMY  STATISTICS 

In  this  section,  selected  attributes  of  the  sample  data 
set  and  the  complete  U.S.  Army  database  are  briefly  compared, 
with  the  intent  of  checking  the  representativeness  of  the 
sample  set. 

Population  attributes  such  as  distribution  of  sex.  Career 
Management  Fields,  and  paygrade  were  obtained  from  the 
complete  U.S.  Army  database  records  consisting  of  over 
250,000  NCO's. 

As  described  in  paragraph  3.B,  the  sample  data  set  of 
50,000  selected  records  had  been  filtered  to  contain  only 
personnel  who  entered  the  Army  after  1976.  Screening  of 
those  50,000  records  for  completeness  of  data  and  uniformity 
of  promotion  policy,  reduced  the  number  in  the  sample  set  to 
approximately  38,000.  It  was  prudent  then,  to  check  the 
final  sample  set  to  see  if  it  retained  its  representative 
character  as  a  random  sample.  It  should  be  noted,  however, 
that  this   comparison  will  not  occur  for  all  study  variables. 


1 MSG  Knopp,  NCOIC  Defense  Management  Data   Center,  West. 
El  Estero  Drive,  Monterey  CA  93946. 
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Reasons  for  this  include  non-availability  of  records  from  the 
MILPERCEN  database,  and  cases  where  the  statistic  was 
produced  through   computation  by   the  author,  promotion  rates 
being  the  principal  example. 

1 .   Comparison  of  Army  versus  Sample  Summary  Statistics 

Formal  hypothesis  testing  for  means  or  distributions 
with  ANOVA  was  unavailable  due  to  computational  and  software 
restrictions.  However,  since  the  intent  of  this  section  was 
simply  to  identify  any  population  shifts,  and  the  magnitude 
of  those  shifts,  observation  of  summary  statistics  is  assumed 
to  be  sufficient.  Specifically,  the  means  and  the  standard 
deviations  of  four  variables  were  obtained  from  both  the 
entire  NCO  population  data  set  and  the  thesis  sample  data 
set.  The  percent  difference  between  the  variable  means  was 
computed  and  expressed  relative  to  the  thesis  sample  data.  A 
table  of  comparative  statistics  and  the  percent  difference  is 
shown  in  Table  II. 


TABLE 

II 

Tot 
Tota 

al  Army  vs 
1  Army 

Sample  Summary 
Sample 

Statistics 

Sample  Size 

(250 

,000) 

(37,854) 

Percent 

Variable 

Mean 

Std  Dev 

Mean   Std  Dev 

Difference 

AFQTP 

48.3 

25.2 

53.4   20.9 

Sample  10% 

> 

SEX 

1.09 

.283 

1.12   .328 

Sample  2.7% 

> 

RACETH 

1.63 

.991 

1.65   .942 

Sample  1.2% 

> 

PAYGD 

5.75 

.597 

5.27   .464 

Sample  5.2% 

< 

The  three  variables  AFQTP,  SEX,  and  PAYGD  have 
noticeable  changes  between  the  Sample  and  the  Total  Army, 
while   the   RACETH   variable   doesn't   appear   to   have   been 
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affected  much  by  sampling.  A  closer  look  at  the  discrete 
distributions,  and  an  overall  conclusion  about  differences  in 
the  two  data  sets  follows. 

2 .   Discrete  Distributions 

Figures  3.1  and  3.2  illustrate  differences  in  the 
discrete  distributions  for  paygrade  and  race  respectively. 
Both  plots  are  Clustered  Bar  Charts,  and  the  percentage  of 
each  level  of  the  discrete  variable  for  both  the  Total  Army 
and  the  Sample  were  plotted  next  to  each  other. 


< 


80  r 


SO 


40 


20 


ARMY  VS  SAMPLE  PAYGRADE  PERCENTTAGES 
CLUSTER  BAR 


60 


ARMY  VS  SAMPLE  RACE  PERCENTAGES 
CLUSTER  BAR 


C2  TOTAL  ARMY 

□  SAMPLE  '♦O 


20   - 


□  TOTAL  ARMY 
C2  SAMPLE 


m. 


E-5  E-6  E-7 

PAYGRADE  VALUES 


WHfTE         BJ^CK      HISPANIC  ,  INDIAN         ASIAN         OTVIER 
RACETH  VALUES 


Figure  3.1  Figure  3.2 

Observation  of  the  tabular  data  and  bar  charts  show 
that  there  are  some  differences  between  the  two  populations. 
Specifically,  the  sample  contains  more  lower  ranking 
personnel,  slightly  more  women,  and  significantly  higher 
AFQTP  related  scores.  The  racial  make-up  of  the  sample 
appears  to  be  similar. 

The  restriction  of  random   sampling  to   only  those  persons 
entering  the   service  after   1976  can   directly  or  indirectly 
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explain  these  differences.  First,  the  lower  average  paygrade 
is  a  direct  result  of  promotion  policy,  in  which  it  is 
impossible  to  achieve  a  rank  above  E-7  in  less  than  ten 
years.  Hence,  the  sample  population  should  be  demonstrate  a 
lower  average  paygrade.  Secondly,  the  slight  increase  in  the 
proportion  of  women  might  be  explained  by  a  general  opening 
up  of  the  services  to  women  in  the  late  seventies  and  early 
eighties.  Thirdly,  the  higher  AFQTP  is  a  direct  result  of 
policy  restrictions  begun  in  Fiscal  Year  1981,  and  formalized 
by  the  1984  Defense  Authorization  Act.  This  placed  quality 
constraints  on  AFQT  Category  and  high  school  diploma  status. 
[Ref.  lOrsec  1-0,  p.l]  Whether  these  restrictions,  or  the 
general  improvement  of  social  acceptance  of  the  military 
services  resulted  in  this  AFQT  improvement  is  a  question 
which  would  require  significant  study  in  itself. 

In  short  then,  the  sample  is  different  in  several  ways 
from  the  total  NCO  population.  It  should  be  noted,  however, 
that  these  results  are  intentional.  The  shifts  caused  by 
restricting  the  sample  to  after  1976  are  felt  to  be  less 
dangerous  to  the  study  than  the  alternative  of  including 
soldiers  who  were  accessed  during  the  draft  and  the  era  of 
Viet  Nam  War  policies.  Finally,  it  is  only  a  matter  of  time, 
unless  significant  changes  in  accession  and  promotion  policy 
occur,  before  the  character  demonstrated  by  the  sample  data 
set  will  constitute  the  norm  for  all  NCOs .  Thus,  it  is 
concluded  that  the  study  sample  is  satisfactory. 
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IV.   SUCCESSIVE  DATA  ANALYSIS 

A.   INTRODUCTION 

In  this  chapter  the  results  of  a  systematic  method  for 
data  analysis  will  be  reported.  This  method  of  analysis 
followed  a  format  which  is  described  by  Chambers  in  Graphical 
Methods  for  Data  Analys is . C Ref .  12]  This  procedure  develops 
an  understanding  of  the  data,  beginning  with  simple 
univariate  descriptive  procedures,  then  progressing  through 
several  increases  in  dimensionality  of  variables,  and  finally 
into  the  more  complex  inferential  procedures  of  model 
building  and  multivariate  regression.  An  abbreviated  outline 
of  this  procedure  is  shown  below. 

1.  Analysis  of   single  variables. 

2.  Comparison  of  variable  distributions. 

3.  Analysis  of  paired  variables. 

4.  Multivariate  graphical  analysis 

5.  Linear  Models  including: 

a.  Simple  Regression 

b.  Multivariate  Models 

In  addition  to  these  steps,  this  procedure  will  be 
supplemented  with  several  non-graphical  measures,  such  as 
ANOVA,  ANCOVA,  and  several  tabular  nonparametric  methods.  It 
should  be  noted  that  this  analysis  reports  only  those 
procedures  which  are  considered  an  essential  step  in 
investigation,  or  whose  results  provided  an  observation  of 
merit.  Many  available  procedures  have  not  been  used  in  this 
chapter,   as   a   consequence   of   the   data   failing   to  meet 
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distributional  assumptions,  and  for  other  reasons  which  would 
make  such  analysis  inappropriate.  During  the  development  of 
this  chapter,  the  results  of  each  level  of  analysis  will 
specify  why  the  next  set  of  analysis  procedures  was  pursued. 
Alternatively,  if  a  popular  class  of  procedures  is 
disregarded,  the  logic  for  disregarding  is  explained. 

The  objective  of  detailing  this  procedure  is  to  present  a 
thorough  depiction  of  the  nature  of  the  variables,  and  to 
explain  the  development  of  resulting  inferences  and  models. 

B.   UNIVARIATE  ANALYSIS. 

1 .   Dependent  Variables 
a.   PRATE 

(1)  General .  The  variable  PRATE  represents  the 
raw  promotion  rate  of  a  particular  individual.  Numerically, 
it  is  the  total  of  promotions  per  month  up  to  the  most  recent 
promotion . 

(2)  Value.  The  variable  PRATE  was  computed 
using  data  obtained  from  the  DMCD  database.  The  time  to  most 
recent  promotion  in  months  was  found  by  subtracting  the  basic 
pay  entry  date  from  the  date  of  latest  award  of  rank.  This 
number  then  became  the  denominator  of  a  ratio  having  the 
individual's  rank,  or  equivalently ,  the  total  number  of 
promotions  the  individual  has  received,  as  the  numerator: 


Individual's  Latest  Rank 

Prate  =  

(Award  Date  of  Latest  Rank)  -  (Date  of  Entry  in  Army) 
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Ranks  were  numerically  represented  with  a  score  of  5  for 
an  E-5  Sergeant,  and  with  6  and  7  for  values  of  the  next  two 
ranks.  The  resulting  units  of  measurement  for  the  PRATE 
variable  were:   units  of  promotion  per  month  of  service. 

(3)  Attributes  of  the  Variable.  The  variable 
PRATE  qualifies  as  a  continuous  variable  with  a  ratio  scale. 
The  continuous  nature  of  the  variable  relies  on  the  fact  that 
the  number  of  months  service  combined  with  three  rank 
structures  yields  sufficient  combinations  of  values,  actually 
190  in  all,  to  use  as  measures. 

There  are  some  inherent  problems  with  the  raw  PRATE 
score,  since  promotion  policies  are  in  effect  which  set 
minimum  time  thresholds  for  promotion.  Thus,  the  promotion 
of  an  individual  who  is  presently  an  E-5  will  be  incomparable 
to  the  promotion  rate  of  an  E-7  whose  three  promotions  have 
been  affected  by  the  minimum  time  policy.  Generally,  the 
minimum  time  in  service  between  promotions  grows  as  rank 
increases,  and  more  senior  soldiers  will  normally  have  lower 
raw  promotion  rates . 

A  second  source  of  bias  is  potentially  found  in  the 
Career  Management  Field  (CMF)  of  the  soldier.  Army  promotion 
policy  is  based  on  a  system  of  minimum  performance  points  to 
be  attained  within  a  CMF  in  order  to  be  considered  for 
promotion.  Generally,  the  more  technical  fields  will  have 
higher  promotion  point  thresholds   than  non-technical  fields. 

The  distribution   of  the   variable  PRATE   and  its  summary 
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statistics  are  shown  in  Figure  4.1.  The  shape  of  the 
histogram  is  positively  skewed,  demonstrating  a  steep 
ascending  slope  in  the  first  partitions,  then  a  generally 
flat  shape  until  just  past  the  median  value.  After  the 
median  value,  a  gradual  downward  sloping  tail  occurs.  A 
rough  interpretation  of  this  shape  is  that  there  appears  to 
be  a  few  individuals  who  are  promoted  at  very  fast  rates, 
followed  by  a  block  of  average  promotion  rates,  then  a 
diminishing  tail  of  individual  promotion  rates  which  fall  to 
the  right  of  the  seventy-fifth  percentile. 
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Figure    4.1 
Distribution      transformation      of      this      variable      was    not 
attempted,     primarily    because       its      usefulness       in      testing    or 
modelling    is    limited    by    the    problems    associated    with    the    bias 
factors    described    above. 
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b.   RATE 

(1)  General .  The  variable  RATE  is  a  re- 
expression  of  the  variable  PRATE.  It  has  bias  due  to 
individual  rank  removed  by  normalizing  each  individual  score 
relative  to  his  or  her  paygrade . 

(2)  Values .  To  compute  the  variable  RATE,  the 
average  PRATE  value  for  each  paygrade  was  calculated,  as  well 
as  the  standard  deviation  for  that  paygrade.  Individual 
scores  were  then  normalized  by  the  transformation: 

RATEt    =   PRATEi  -  AVERAGE  for  that  Rank 


STANDARD  DEVIATION  THAT  RANK 


(3)  Attributes  of  the  Variable.  The  variable 
RATE  is  also  a  continuous  ratio  scale  variable,  as  it  is  a 
transformation  of  PRATE. 

The  removal  of  influence  due  to  rank  was  confirmed  by 
computing  the  correlation  coefficient  between  the  variables 
RATE  and  PAYGD.  As  seen  in  Table  X,  a  value  of  near  zero 
resulted  where  the  previous  correlation  coefficient  for  PRATE 
and  PAYGD  had  been  -.495.  Thus,  the  transformation  to  RATE 
from  PRATE  results  in  a  variable  independent  of  PAYGD. 

The  distribution  shape  of  the  RATE  histogram,  shown  in 
Figure  4.2,  appears  slightly  non-normal,  but  a  check  of  the 
summary  statistics  for  quantiles  show  that  they  correspond 
closely  to  the  standard  normal  quantiles.  Thus,  the 
assumption  of  normality  for  procedures  using  this  variable  is 
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still  reasonable,  based  on   observation   of   the  distribution 
shape  and  the  close  agreement  of  quantile  values. 

Figure  4.2  presents  a  histogram  and  summary  statistics  for 
the  RATE  variable. 
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Figure  4.2 
c.   PRA 

(1)  General .  The  variable  PRA  is  another 
recomputation  of  the  raw  promotion  rate.  PRA  controls  for 
the  career  management  field  as  well  as  paygrade.  It  is  set 
of  normalized  promotion  scores,  which  are  independent  of 
PAYGD  and  CMF.  Verification  of  the  independence  of  PRA  from 
the.ge  variables  was  also  confirmed  by  checking  correlation 
coefficients.  Both  variables  CMF  and  PAYGD  had  near  zero 
values  of  correlation  with  PRA. 

(2)  Values .  Computing  the  variable  PRA  was  done 
in  the  same  manner  as  in   RATE,  however   a  mean   and  standard 
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deviation  for  each  CMF  and  PAYGD  combination  was  computed  and 
used  in  the  normalization  equation. 

(3)  Attr 3  butes .  PRA  is  a  continuous  variable 
with  a  ratio  scale.  The  distribution  of  PRA  appears  normal, 
with  the  quantile  values  very  close  to  the  standard  normal. 
A  comparison  of  percentile  values  for  PRA  versus  the  standard 
normal  are  shown  in  TABLE  III. 
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Figure  4 . 3 
A  comparison  of  percentiles  for  the  PRA  distribution 
versus  the  standard  normal  distibution  is  shown  in  Table  III. 
Specifically,  the  PRA  percentile  values  are  listed  with  the 
corresponding  standard  normal  percentile  values  for  the  same 
data  point.  For  example,  -1.5510  is  the  PRA  five  percentile, 
while  a  -1.5510  indexed  in  a  standard  normal  table  results  in 
a  six  percent  value. 
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TABLE  III 

Comparison  of  PRA  vs 

Standard 

N 

ormal  Percentiles 

PRA 

St 

andard  Normal 

5% 

6% 

25% 

22.6% 

50% 

48.4% 

75% 

75.7% 

95% 

96.3% 

Normality  for  this  variable  will  be  assumed  based  on 
general  distribution  shape  and  the  close  correspondence  of 
the  data  percentiles  to  the  standard  normal  percentiles. 

2 .   Control  Variables 

d.  SEX 

The  variable  SEX  is  discrete  and  nominal.  Males 
are  represented  by  a  numerical  value  of  one,  and  females  are 
represented  with  a  two.  In  the  study  sample,  12.29  percent 
of  the  sample  was  female,  and  87.71  percent  were  male. 

e.  CMF 

Career  Management  Field  (CMF)  is  a  discrete 
variable  with  nominal  scale.  Thirty  three  CMF's  are 
represented  in  the  sample.  Each  Career  Management  Field  is 
assigned  a  numerical  value,  for  example,  the  Infantry  branch 
is  designated  as  CMF  11.  These  assignments  are  a  Department 
of  the  Army  numbering  system,  and  can  be  reviewed  along  with 
the  CMF  percentage  and  frequency  table  in  Appendix  A. 

There  is  some  ordinal  information  in  the  numbering 
system,  for  instance,  low  CMF  numbers  are  indicative  of  a 
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combat  branch,  such  as  Infantry  or  Armor.  Center  CMF  values 
are  indicative  of  combat  support  branches,  such  as  Signal  and 
Chemical.  Upper  CMF  values  are  from  the  combat  service 
support  branches,  such  as  Medical  and  Language  Specialist, 

Figure  4.4,  the  CMF  histogram,  does  reflect  the 
distribution  of  the  three  general  groupings  of  CMF  densities: 
combat,  combat  support,  and  combat  service  support.  The 
combat  and  combat  support  values  have  roughly  equivalent 
representation,  while  the  upper  numbered  service  support 
CMF's  are  about  two  thirds  the  size  of  the  other  groups. 
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Figure  4.4 
f.   RACETH 

The  race-ethnic  variable  is  a  discrete,  nominal 
variable.  The  values  represented  and  their  percentages  are 
shown  in  table  IV. 
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TABLE 

IV   Sample  Race  Percentages 

Value 

Race 

Percent 

Cumulative 
Percent 

1 

White 

52.43 

52.43 

2 

Black 

38.59 

91  .02 

3 

Hispanic 

5.58 

96.6 

4 

American 

Indian/Alaskan  Native    .26 

96.86 

5 

Asian/Pacific  Islander           1.15 

98.01 

6 

Other/Unk 

nown                     1.99 

100.00 

g.   PAYGD 

Paygrade  is  a  discrete,  nominal  variable.  The 
selection  of  NCO  rank  from  personnel  enlisting  after  1976 
resulted  in  representation  by  paygrades  E-5  through  E-7  only 
The  distribution  of  PAYGD  is  shown  in  Table  V. 


TABLE  V   Sample 

Paygrade 

Percentages 

Value 

Rank 

Percentile 

Cumulative 

Percent 

5 

Sgt  E-5 

73.29 

73.29 

6 

Staff  Sergeant  E-6 

25.89 

99.19 

7 

SEC   E-7 

0.81 

100.00 

The  0.81  percent  for  E-7  results  in  only  307  SFC's  in  the 
sample.  Despite  the  preponderance  of  representation  by  the 
other  ranks,  a  sample  size  of  307  for  the  E-7  rank  still 
allows  for  adequate  representation  of  that  subcategory. 
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3 .   Intelligence  and  Academic  Scores 
h.   GTSCR 

The  General  Intelligence  Test  Score  (GTSCR)  of 
the  individual  is  a  continuous  variable  with  at  least  an 
ordinal  scale.  The  range  of  values  run  from  50  through  160. 
The  lower  value  of  50  represents  the  corresponding  minimum 
score  of  ASVAB  modules  that  would  allow  for  enlistment  in  the 
Army.  The  histogram  of  the  GTSCR  variable,  shown  in  figure 
4.5,  is  approximately  normal.  Checking  the  quantiles  shows  a 
larger  density  in  the  distribution  to  the  left  of  the  mean, 
with  slightly  lower  valvaes  for  quantiles  right  of  the  mean. 
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HISTOGRAM  TABLE 

X 

GTSCR 

SELECTION       : 

ALL 

X  LABEL         : 

GTSCR 

NO.  OF  ELEMENTS  : 

37S54 

X  MEAN 

108.23 

STD.  DEVIATION 

14.275 

SKEWNESS 

0.129 

KURTOSIS 

3.3632 

5-PERCENT  I LE 

84 

25-PERCENT I LE 

.99 

MEDIAN 

:109 

7 5-PERCENT I LE 

:117 

95-PERCENTILE 

:130 

X  MIN. 

:54 

X  MAX. 

:156 

Figure    4.5 
47 


i.   AFQTP 

The  Armed  Forces  Qualification  Test  Percentile  is 
a  continuous  variable  with  ordinal  scale.  Its  value 
represents  the  relative  standing  of  an  individual's  test 
score  referenced  against  a  1944  population.  This  means  that 
an  individual's  raw  AFQT  score  is  compared  against  a  standard 
table  of  values  that  was  developed  in  1944.  This  table  of 
values  from  1944  was  designed  to  represent  the  distribution 
of  raw  AFQT  test  scores  for  the  entire  1944  American  youth 
population.  Hence,  a  resulting  individual  AFQT  score  is 
simply  the  corresponding  percentile  of  the  individual  raw 
AFQAT  score  relative  to  the  entire  1944  population  AFQT  test 
distribution . 

The  histogram  and  summary  statistics  for  AFQTP  are  shown 
in  Figure  4.6.  The  density  of  AFQTP  is  partially  symmetric 
about  the  mean.  The  lower  five  percent  quartile  is  at  a 
value  of  21,  demonstrating  the  restriction  applied  to  CAT  V 
and  VI  personnel  since  1980.  Use  of  the  AFQT  score  for  this 
study  is  primarily  for  comparative  reasons.  AFQT  cannot  be 
used  in  any  developed  model  since  scoring  against  the  1944 
reference  population  has  ceased.  As  will  be  seen  in 
subsequent  chapters,  AFQT  was  discarded  anyway  when  OAFQT 
proves  to  a  better  explanatory  variable. 
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HISTOGRAM  TABLE 

X 

AFOTP 

SELECTION       : 

ALL 

X  LABEL 

AFOTP 

NO.  OF  ELEMENTS 

37854 

X  MEAN 

53.^19 

STD.  DEVIATION 

20.965 

SKEWNESS 

0.29913 

KURTOSIS 

2.2128 

5-PERCENTILE 

21 

25-PERCENT I LE 

37 

MEDIAN 

50 

75-PERCENT I LE 

:68 

95-f'ERCENTILE 

:91 

X  MIN. 

:10 

X  MAX. 

:99 

Figure  4.6 
j.   OAFQTP 

The  OAFQTP  variable  is  a  continuous  variable  with 
ordinal  scale.  It  is  fundamentally  the  same  as  the  AFQTP 
variable,  excepting  the  reference  for  measurement,  which  is  a 
1980  population.  The  distribution  for  OAFQTP  is  considerably 
more  dense  in  the  lower  values  than  AFQTP.  Explanation  of 
this  shift  can  be  seen  by  reviewing  the  transformation  tables 
in  Appendix  A  for  converting  1944-based  scores  to  1980 
scores.  The  transformations  for  values  below  80  result  in  a 
1944  based  score  to  be  reduced  in  almost  every  case.  The 
amount  of  reduction  varies,  but  it  can  be  as  much  as  four 
points.  Only  when  the  scores  go  above  85  are  there  any 
increasing  transformations. 


49 


5  o 

^  8 

O 
O 


OAFQT  HISTOGRAM  AND  STATISTICS 
(N=37854) 


HISTOGRAM  TABLE 

X 

OAFQTP 

SELECTION 

ALL 

X    LABEL 

OAFQT 

NO.    OF   ELEMENTS 

37854 

X   MEAN 

45.319 

STD.    DEVIATION 

24.779 

SKEWNE5S 

0.53139 

KURT05IS 

2.1725 

S-FERCENTILE 

14 

25-PERCENT I LE 

25 

MEDIAN 

41 

75-PERCENT I LE 

64 

9 5-PERCENT I LE 

92 

X  MIN, 

1 

X  MAX. 

99 
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Figure  4.7 
k.   EIMCAT 

EIMCAT  is  the  mental  category  of  an  individual 
based  on  the  1980  reference  population  AFQT  test  score. 
EIMCAT  is  a  discrete  and  ordinal  scale  variable.  The 
assignment  of  categories  is  a  Department  of  Defense  standard/ 
and  is  a  common  reference  for  all  services.  The  breakdown  of 
values  is  as  follows: 


TABLE    VI 

Sample    Men 

tal    Category 

Percentages 

Value 

Category 

AFQT 

Percent 

Cumulative 
Percent 

1 

Cat    V 

01-09 

.33 

.33 

2 

Cat    IV    C 

10-15 

6.736 

7.067 

3 

Cat    IV    B 

16-20 

9.788 

16.854 

4 

Cat    IV    A 

21-30 

19.187 

36.041 

5 

Cat    III    B 

31-49 

26.116 

62.157 

6 

Cat    III    A 

50-64 

13.053 

75.21 

7 

Cat    II 

65-92 

19.99 

95.2 

8 

Cat    I 

93-99 

4.8 

100.000 
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A  histogram  of  the  EIMCAT  values  follows  in  Figure  4.8.- 
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Figure  4 . 8 
Observation  of  the  above  figures  demonstrates  more 
clearly  the  fact  that  categorization  into  EIMCAT  category  is 
not  evenly  distributed  across  the  scale  of  OAFQT  scores.  For 
example,  the  center  EIMCAT,  value  five,  spans  almost  twenty 
points,  while  EIMCAT  eight  contains  only  the  upper  seven 
point  scores.  EIMCAT  does  make  available  an  established, 
discrete  scale  measurement  representing  intelligence  test 
scores  for  use  in  appropriate  statistical  procedures. 
1.   HIYRED 

HIYRED  is  the  highest  year  of  education  held  by 
the  individual  upon  entry  into  the  army.  It  is  a  discrete 
and  ordinal  scale  variable.  The  values  and  distribution 
percentages  are  shown  on  the  next  page  in  Table  VII. 
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TABLE  VII   Sample  Hi 

ghest 

Year  of  Education 

Value 

Cateqorv 

Percent 

Cumu. 

Lative 

Percent 

1 

1-7  Years 

0.018 

0 

.018 

2 

8    Years 

0.153 

0 

.  172 

3 

1  Year  High  School 

1.397 

1 

.569 

4 

2  Years  High  School 

4.7 

6 

.269 

5 

3-4   years  HS  (no  di 

ploma ) 

6.935 

13 

.203 

5.5 

High  School  GED 

4.813 

18 

.017 

6 

High  School  Diploma 

71.274 

89 

.29 

7 

1  Year  College 

3.305 

92 

.595 

8 

2  Years  College 

3.453 

96 

.048 

9 

3-4  Years  College  (no  degree)  1.337 

97 

.385 

10 

College  Graduate 

2.560 

99 

.945 

11 

Masters  or  Equivalen 

t 

0.05 

99 

.995 

12 

Doctrate  or  Equivale 

nt 

0.005 

100 

.000 

m.   EDLVL 

EDLVL  is  the  present  level  of  education  for  the 
individual.  These  scores  are  related  to  HIYRED,  in  that  any 
education  taken  by  the  individual  subsequent  to  enlistment  is 
recorded  in  this  variable.  A  GED  equivalency  is  included  as 
a  value  of  six  for  high  school  completion. 


TABLE  VIII   Sample  Education 

Level  Percentages 

Value 

Cateqorv 

Percent 

Cumulative 
Percent 

1 

1-7  Years 

0.042 

0.042 

2 

8    Years 

0.011 

0.053 

3 

1  Year  High  School 

0.198 

0.251 

4 

2  Years  High  School 

0.793 

1.043 

5 

3-4   years  HS  (no  diploma) 

1.503 

2.547 

6 

High  School  Diploma 

80.443 

82.99 

7 

1  Year  College 

6.089 

89.079 

8 

2  Years  College 

5.828 

94.907 

9 

3-4  Years  College  (no  degree)  2.037 

96.944 

10 

College  Graduate 

2.948 

99.829 

11 

Masters  or  Equivalent 

0.1 

99.992 

12 

Doctors  or  Equivalent 

0.008 

100.000 
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Observation  of  Figure  4.9,  or  percentages  in  Table  VIII, 
shows  an  observable  upward  shift  of  education  level  after 
enlistment.  This  is  possible,  and  encouraged  with  official 
continuing  education  and  high  school  completion  programs. 
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Figure  4 . 9 
n.   NCOE 

The  Noncommissioned  Officer  Education  variable, 
NCOE,  is  a  discrete  and  ordinal  scale  variable.  It  reports 
the  level  of  military  schooling  accomplished  by  the 
individual.  Military  schooling  categories  are  generally 
organized  in  three  ascending  levels:  primary,  basic  and 
advanced.  At  the  two  lower  levels,  primkry  and  basic,  there 
are  seperate  courses  for  combat  and  non-combat  CMF's.  In 
some  cases,  there  has  been  an  award  of  an  On-The-Job  Training 
qualification.  The  OJT  award  is  used  to  give  credit  to  an 
NCO  who   can  achieve  technical  competence  in  advance  of  being 
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eligible  for  promotion  to  the  next  higher  paygrade. 

As  previously  mentioned,  attendance  at  military  schools 
is  sometimes  associated  with  an  individual  being  previously 
identified  as  a  superior  performer.  This  is  true  mostly  in 
the  advanced  level  schools  where  selection  for  attendance  is 
through  Department  of  the  Army  Selection  Boards .  At  the 
primary  level,  local  commanders  have  authority  to  establish 
selection  procedures  and  often  will  make  primary  school 
attendance  a  locally  mandatory  requirement  for  junior  NCOs . 
Table  IX  and  Figure  4.10  demonstrate  the  categories  and 
distribution  of  NCOE. 


TABLE  IX    Sample  NCOE  Percentag 

BS 

Value 

Category                       P 

Brcent 

Cumulative 

Percent 

0 

Nonpar tici pant 

21 

19 

21.19 

1 

Primary  NCO  Course  (CBT  CMF) 

4 

46 

25.65 

2 

Primary  Leadership  Graduate 

39 

36 

65.25 

3 

On-The-Job  Credit  for  E-5  skills 

5 

38 

70.63 

4 

Primary  Technical  Course  Graduate 

2 

82 

73.45 

5 

On-The-Job  Credit  for  E-6  skills 

0 

0 

73.45 

6 

Basic  Technical  Course  Graduate 

5 

11 

78.56 

7 

Basic  NCO  Course  (CBT  CMF) 

15. 

99 

94.55 

8 

On-The-Job  Credit  for  E-7  skills 

, 

01 

94.56 

9 

Advanced  NCO  Course  Selectee 

2. 

28 

96.84 

10 

Advanced  NCO  Course  Graduate 

3. 

06 

99.89 

11 

Advanced  NCO  nongraduate,  OJT 

, 

01 

99.9 

12 

On-The-Job  Credit  for  E-8  skills 

- 

06 

100.00 

Figure  4.10  presents  a  histogram  of  NCOE  discrete  levels 
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Figure  4 . 10 
o.   PQSCR 

PQSCR  is  a  report  of  the  Primary  Military 
Occupation  Skill  Qualification  Test  Score  (SQT)  of  the 
individual.  It  is  a  continuous  and  ratio-valued  variable. 
The  SQT  is  a  service  related  test  which  is  used  to  determine 
the  technical  competence  of  a  soldier.  SQT  score  has  been 
used  by  promotion  boards  as  a  qualitative  measure  for 
promotion.  The  numerical  value  represents  the  percent  of 
correct  answers  on  a  written  and  hands-on  evaluation. 
Separate  SQT  tests  are  written  for  each  CMF,  although  the 
structure  of  the  tests  are  similar. 

The  distribution  of  PQSCR,  shown  in  Figure  4.11,  is  more 
dense  in  the  upper  values,  with  an  abnormally  long  left  tail 
extending  to  a  lower  bound  of  21.  An  explanation  for  the 
shape  of  the  PQSCR  distribution  is  an  involved  topic,  and  has 
itself  been  the  subject  of  study.  A  general  observation  is 
that   PQSCR   has   previously   been   used   in   a   manner  where 

55 


individual  soldier  scores  were  often  aggregated  as  a  means  of 
comparison  of  the  parent  unit  of  the  soldiers .[ Ref .  11 :p.  43 
Thus,  significant  units  and  individual  training  emphasis  has 
been  focused  on  SQT  testing  in  previous  years,  and  pressure 
to  perform  we] 1  was  influenced  by  the  parent  organizations. 
As  a  result,  a  positively  skewed  distribution,  rather  than  a 
normal  distribution,  is  understandable. 
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HISTOGRAM  TABLE 

X  : PQSCR 

SELECTION  :ALL 

X  LABEL  :PGSCR 

NO.  OF  ELEMENTS  : 37854 

X  MEAN  : 78. 384 

STD.  DEVIATION  : 1 1 . 609 

SKEWNESS  : -0.70832 

KURTOSIS  : 3. 5739 

5-PERCENTILE  :57 

25-PERCENTILE  :71 

MEDIAN  :80 

75-PERCENT I LE  :87 

95-PERCENTILE  :95 

X  MIN.  :21 
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Figure  4.11 
3 .   Summary 

The  fifteen  variables  used  in  this   study  demonstrate 
a   wide   variety   of   characteristics.    All  of  the  dependent 
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variable  choices  were  continuous  with  two,  RATE  and  PRA, 
showing  only  slight  departures  from  normality.  The  other 
continuous  variables  did  not  have  identifiable  distributions, 
and  could  not  be  transformed  to  normality  using  power  or  log 
transformations.  Nor  is  it  entirely  clear  that  one  would  need 
to  use  a  transformed  variable  in  subsequent  analysis. 

The  independent  variables  compris  of  a  mixture  of 
continuous  and  discrete  values,  with  both  ordinal  and  ratio 
scales.  Within  the  independent  variables  there  are  two 
principal  sets  of  related  variables.  The  intelligence  test 
scores,  AFQTP,  OAFQTP,  EIMCAT,  and  to  a  lesser  extent  GTSCR, 
are  all  derived  from  the  ASVAB.  These  variables  differ  from 
one  another  in  varying  degrees,  and  are  either  a  re- 
expression,  transformation,  or  a  similarly  derived  set  of 
scores . 

The  two  academic  performance  measures,  EDLVL  and  HIYRED, 
are  related,  in  that  EDLVL  is  simply  the  addition  of 
additional  schooling  since  entry  into  the  Army. 

Despite  the  similarities  within  these  two  sets  of 
variables,  it  is  felt  that  sufficient  differences  in 
informational  value  are  present  in  each  expression.  Further, 
since  the  variables  used  are  all  standard  data  collection 
items  for  the  DMDC  database,  each  variable  expression  will  be 
studied.  The  relative  merit  of  any  single  or  combined 
variable  from  this  study  may  be  useful  to  managers  seeking 
appropriate  data  sources  for  other  studies. 
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An  important  result  of  the  analysis  of  these  study 
variables  is  the  observation  that  many  of  the  necessary 
assumptions  for  standard  parametric  hypothesis  testing. 
Analysis  Of  Variance  (ANOVA),  and  possibly  regression  will 
not  be  met.  These  include  assumptions  about  the  form  of  the 
distribution  as  well  as  the  scale  of  the  variable.  In  this 
study,  analysis  will  initially  seek  to  use  standard 
parametric  methods.  However,  if  results  of  the  analysis  are 
sensitive  to  distributional  or  scale  assumptions,  those 
assumptions  will  be  checked.  If  examination  of  assumption 
requirements  fails,  or  if  there  is  a  nonparametric  test  of 
similar  efficiency,  nonparametric  tests  will  be  conducted  as 
a  replacement  or  as  a  confirmatory  precedure. 

C.   BIVARIATE  ANALYSIS 

This  section  will  concentrate  on  identifying 
relationships  between  pairs  of  variables,  and  in  identifying 
shifts  in  distribution  as  a  function  of  the  effects,  or 
categorical,  variables.  Three  methods  of  analysis  will  be 
used  in  this  section.  The  first  method  is  analysis  of 
association  using  a  matrix  of  Pearson  product-moment 
correlations.  This  will  provide  intital  information  as  to 
the  strength  of  association  between  any  two  variables,  and 
the  direction  of  that  relationship,  being  either  positively 
or  negatively  correlated.  The  second  method  will  be  analysis 
of  scatterplots  of  pairs  of  variables,  using  the  techniques 
of   LOWESS   and   Jittering   to   better  view  any  trends  in  the 
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variables.  This  method  will  give  initial  information  on  what 
type  of  fitted  line,  and  hence  what  mathematical 
relationship  exists  between  independent  and  dependent 
variables.  Of  significant  interest  will  be  whether  the 
relationship  is  fundamentally  linear,  or  whether  it  is 
possibly  polynomial  or  curvilinear.  The  third  and  final 
method  used  will  be  analysis  of  three-dimensional  empirical 
distribution  plots.  This  will  demonstrate  some  shifts  in 
distribution  within  several  of  the  effects  variables. 

1 .   Correlation  Matrix 

As  earlier  mentioned,  the  purpose  of  reviewing  the 
Pearson  product-moment  correlation  matrix  is  to  identify 
pairs  of  variables  which  have  a  strong  association.  The 
range  of  the  correlation  coefficient,  rho,  is  from  -1  to  +1, 
and  a  value  of  zero  indicates  that  the  variables  have  no 
linear  association  with  each  other.  A  value  of  +1  indicates 
an  exact  direct  linear  relationship,  while  a  -1  indicates  an 
exact  inverse  linear  relationship.  This  measurement  of 
association  is  not  completely  indicative  of  dependency,  and 
is  only  a  preliminary  tool  to  identify  candidate  variables 
for  testing  and  subsequent  inferential  statistics. 

Remembering  the  central  question  of  this  thesis,  the  most 
important  pairs  of  variables  will  then  be  any  of  the 
intelligence  and  academic  scores  paired  with  the  promotion 
rate  variables.  Of  almost  equal  interest  will  be  any 
interval   scale   effects   variables   demonstrating   a   strong 
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linear  relationship  with  the  promotion  variables. 

The  strength  of  the  linear  relationship  between  two 
variables/  or  its  level  of  significance/  is  based  on  how  much 
variance  there  is  in  the  estimated  value  of  rho.  Further/ 
the  variance  of  rho  is  dependent  on  the  sample  size  being 
considered.  For  example/  if  the  sample  size  were  small,  and 
the  value  of  rho  had  a  standard  deviation  of  plus  or  minus 
.3/  then  a  large  positive  or  negative  value  of  rho  would  be 
needed  to  effectively  demonstrate  significance.  Conversly, 
for  a  large  sample  set  with  very  small  standard  deviation  for 
rho,  a  much  smaller  rho  value  could  be  considered 
significant.  An  estimate  for  the  standard  deviation  of  rho 
can  be  found  by  computing  the  inverse  of  the  square  root  of 
the  sample  size.  Considering  the  thesis  sample  size  of 
37,854,  the  resulting  estimate  of  the  standard  deviation  of 
rho  is  .005139.  Thus,  a  value  of  rho  different  from  zero  by 
plus  or  minus  .01,  could  be  considered  significant. 

In  Table  X  the  complete  Pearson  product-moment 
correlation  matrix  for  the  study  variables  is  given.  The 
Pearson  product-moment  computation  is  a  parametric  method  and 
assumes  pairs  of  normal  and  continuous  variables.  This  is 
the  preferred  method  since  we  are  primarily  interested  in 
correlations  with  either  the  RATE  or  PRA  variable  as  one  of 
the  pair  of  variables.  Additionally,  it  is  possible,  using 
the  Spearman  nonparametric  method,  to  compute  a  correlation 
value  rho  for  pairs  of  ordinal,  or  higher  scale  variables. 
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[Ref.  13:pp.  251-253]  The  Spearman  method  is  a  distribution 
free  method  providing  correlations  based  on  the  ranks  of  the 
variables.  The  last  column  on  the  second  part  of  Table  X 
lists  the  correlations  computed  using  the  Spearman  method. 
Comparison  of  Spearman  versus  Pearson  values  showed  that 
there  was  an  acceptable  correspondence  between  the  two 
methods/  and  Pearson  values  are  used  exclusively  to  simplify 
analysis . 

Even  with  application  of  both  the  Spearman  and  Pearson 
methods  there  remained  several  pairs  of  variables  which  did 
not  meet  the  assumed  distributional  characteristics  for 
correct  interpretation  of  the  rho  value.  These  variables  are 
the  discrete,  nominal  variables  SEX,  RACETH,  and  possibly 
CMF.  Their  results  are  included  in  Table  X,  but  any 
interpretation  of  the  rho  value  would  be  ineffective.  The 
most  important  rho  values  in  Table  X  are  located  under  the 
PRA  column  and  are  underlined. 
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TABLE  X 

Pearson 

Correl 

ation  C 

oef f icients 

PRATE 

RATE 

PRA 

GTSCR 

AFQTP 

OAFQTP 

EIMCAT 

PQSCR 

PRATE  1.000 

.822 

.790 

.035 

.100 

.177 

.174 

.039 

RATE    .822 

1.000 

.951 

.118 

.155 

.209 

.200 

.101 

PRA      .790 

.951 

1  .000 

.107 
1  .000 

.133 
.741 

.177 
.734 

.170 
.689 

.094 
.274 

GTSCR   .035 

.118 

.107 

AFQTP   .100 

.155 

.  133 

.741 

1.000 

.937 

.903 

.308 

OAFQTP  .177 

.209 

.177 

.734 

.937 

1.000 

.955 

.315 

EIMCAT  .174 

.200 

.170 

.689 

.903 

.955 

1.000 

.305 

HIYRED  .156 

.168 

.177 

.210 

.215 

.245 

.209 

.066 

EDLVL   .085 

.139 

.162 

.266 

.257 

.266 

.241 

.  100 

NCOE   -.200 

.047 

.006 

.039 

-.009 

-.060 

-.062 

.093 

SEX     .013 

-.019 

.036 

.055 

.159 

.050 

.062 

-.013 

CMP    -.074 

-.143 

.000 

.  113 

.  106 

.074 

.067 

-.042 

RACETH-.064 

-.084 

-  .057 

-.242 

-.305 

-.325 

-.314 

-.128 

PAYGD  -.495 

.000 

.000 

.143 

.087 

.031 

.023 

.097 

PQSCR   .039 

.101 

.094 

.274 

.398 

.315 

.305 

L.OOO 

PEARSON 

COEFFICIENTS 

CONTINUED 

SPEARMAN 

PAYGD 

HIYRED 

EDLVL 

NCOE 

SEX 

CMF 

RACETH 

PRATE 

PRATE  - 

.495 

.157 

.085 

-.200 

.013 

-.075 

-.064 

1.000 

RATE   - 

.000 

.168 

.139 

.047 

-.018 

-.142 

-  .084 

.808 

PRA 

.000 

.178 

.162 

.005 

.036 

.000 

-.056 

.777 

GTSCR 

.143 

.210 

.265 

.039 

.054 

.113 

-.242 

.020 

AFQTP 

.087 

.215 

.258 

-  .009 

.159 

.107 

-.306 

.075 

OAFQTP 

.031 

.245 

.266 

-.060 

.049 

.074 

-.325 

.165 

EIMCAT 

.023 

.209 

.242 

-.062 

.063 

.068 

-.313 

.158 

HIYRED 

.001 

1  .000 

.708 

-  .063 

.131 

.146 

.024 

.147 

EDLVL 

.098 

.708  1 

.000 

.004 

.114 

.177 

.039 

.038 

NCOE 

.433 

-.063 

.004 

1.000 

-.081 

-.184 

.015 

-.208 

SEX 

.057 

.131 

.114 

-.081 

1.000 

.258 

.042 

.020 

CMF 

.053 

.146 

.  177 

-  .  184 

.258 

1.000 

.025 

-.069 

RACETH- 

.016 

.024 

.039 

.015 

.042 

.025 

1.000 

-.092 

PAYGD  1 

.000 

.000 

.098 

.432 

-.056 

-.054 

-.016 

-.535 

PQSCR 

.097 

.066 

.100 

.093 

-.013 

-.042 

-.128 
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The  most  significant  observations  from  the  tables  are 
summarized  as  follows: 

For  the  variable  RATE  there  is  zero  correlation  with  the 
PAYGD  variable.  Thus,  the  transformation  of  PRATE  to  RATE 
did  remove  the  influence  of  paygrade  on  promotion  rate. 
Similarly,  for  the  variable  PRA,  both  PAYGD  and  CMF  have  zero 
correlation . 

As  expected,  the  three  promotion  rate  variables  are  all 
highly  correlated  in  a  positive  direction. 

With  two  exceptions,  the  correlation  values  for  the 
effects  and  independent  variables  have  similar  magnitudes  and 
signs  across  all  three  expressions  of  promotion  rate.  The 
first  exception  is  the  NCOE  variable.  Under  PRATE  it  is 
negatively  correlated  with  a  value  of  0.2,  and  positively 
correlated  with  lower  values  for  RATE  and  PRA.  This  result 
makes  sense  when  one  considers  that  NCOE  is  highly  correlated 
with  PAYGD,  (0.565).  Specifically,  raw  promotion  rates  are 
lower  for  higher  grade  NCO's  due  to  time  in  service  and  time 
in  grade  requirements,  (-.495).  Hence,  NCOE,  which  is  highly 
correlated  with  PAYGD,  will  also  reflect  that  inverse 
relationship.  When  the  influence  of  paygrade  is  eliminated, 
as  it  is  in  RATE  and  PRA,  this  negative  correlation  is 
incidentally  removed. 

The  second  exception  is  for  the  variable  SEX  where  it  is 
positive  signed  for  PRATE  and  PRA,  but  negatively  signed  for 
RATE.   The  magnitude  for  all  three  values  are  close   to  zero. 
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An  explanation  for  the  difference  in  sign  between  PRA  and 
RATE  will  be  presented  in  the  analysis  of  empirical 
distributions  and  coded  scatterplots . 

Groups  of  closely  related  variables  have  generally  the 
same  correlation  across  the  three  promotion  variables. 
Specifically,  AFQTP,  OAFQTP,  EIMCAT,  and  to  a  lesser  extent, 
GTSCR,  all  demonstrate  a  strong  positive  correlation  against 
each  other,  and  show  the  same  trend  when  compared  against  the 
promotion  rate  variables.  The  academic  variables  HIYRED  and 
EDLVL  demonstrate  similar  characteristics,  however,  EDLVL  is 
weaker  than  HIYRED  with  respect  to  the  promotion  rate 
variables . 

Considering  RATE  and  PRA  as  the  better  promotion 
variables  to  model  with,  and  allowing  for  only  one  variable 
from  each  of  the  related  groups,  the  six  most  significant 
correlated  variables  were  selected.  These  variables,  listed 
in  descending  absolute  value  of  rho,  are  shown  in  Table  XI. 


TABLE  XI   Most 

S 

igni 

ficant  C 

orrelated  Variables 

Consi 

.d 

erin 

g  both  RATE  and  PRA 

Variable 

Rho  Va 

lue 

HIYRED 

approx 

0.17 

OAFQTP 

approx 

0.14 

GTSCR 

approx 

0.10 

PQSCR 

approx 

0.09 

RACETH 

approx 

-0.06 

NCOE 

approx 

0.006 

These   variables,   paired   either   with  RATE  or  PRA,  were 
used   as   the   starting   basis   for   multivariate   regression 
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analysis.  The  effects  variable  SEX  was  included  for 
subcategory  analysis  in  an  effort  to  detect  any  influence  it 
might  have  on  the  primary  relationships. 

2 .   Paired  Scatter  Plots  and  Simple  Regression 

Plots  of  paired  independent  and  dependent  variables 
were  implemented  to  accomplish  two  purposes.  The  first 
purpose  was  to  visually  search  for  any  dominant  plotting 
patterns.  Since  the  rho  values  found  in  the  previous  section 
are  designed  to  detect  only  linearity,  it  is  quite  possible 
that  nonlinear  relationships  could  exist  between  the 
explanatory  and  dependant  variables.  For  example,  if  the  X-Y 
relationship  was  strictly  Y=X* ,  a  computed  rho  value  should 
be  zero.  Thus,  if  one  relied  only  on  correlation 
coefficients  to  detect  relationships,  he  would  be  misled  into 
thinking  that  no  relationship  existed  between  the  two 
variables.  Simply  plotting  X-Y  scatterplots  of  the 
explanatory  variables  with  the  promotion  variables  did  not 
require  specification  of  the  response  of  the  dependant 
variable.  Visual  observation  could  then  be  relied  upon  to 
detect  dominant  patterns  of  any  form.  These  scatterplots 
used  two  special  procedures,  LOWESS  and  Jittering,  which  will 
be  described  in  analysis  of  Figures  4.12  and  4.13. 

Secondly,  simple  least  squares  regression  was  performed 
for  all  variables  which  had  been  previously  found  to  be 
significantly  correlated.  The  simple  least  squares 
regression  procedure   yielded  a   value  called  the  Coefficient 
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of  Determination,  or  R2  (R-square).  R2  is  mathematically 
related  to  the  rho,  and  in  the  one  variable  case,  the  square 
of  rho  is  equal  to  R2 .  Thus,  R2  can  also  be  used  to 
qualitatively  interpret  the  strength  of  linearity  for  a 
simple  linear  model.  The  advantage  of  producing  R2  values 
was  that  R2  directly  represents  the  proportion  of  variance 
accounted  for  by  the  assumption  of  a  linear  model.  The 
results  for  each  of  the  regressions  and  an  explanation  of  R2 
will  be  discussed  in  analysis  of  Table  XII. 
a.   Paired  Scatterplots 

Since  interpretation  of  the  correlation 
coefficients  assumes  linearity,  visual  analysis  of  pairwise 
scatterplots  was  used  to  search  for  observable  patterns, 
linear  or  otherwise.  This  visual  approach  did  not  require 
interpretation  of  single  derived  parameters  to  identify  any 
patterns . 

In  producing  the  scatterplots  the  LOWESS  procedure  was 
used.  LOWESS,  which  stands  for.  Locally  Weighted  Regression 
Scatter  Plot  Smoothing,  CRef.  12:pp  94-95]  is  a  nonparametric 
smoothing  procedure  which  is  designed  to  estimate  functional 
relationships  between  Y  and  X.  In  particular,  no  linear  or 
quadratic  relationship  is  assumed.  For  scatterplots  of 
discrete  variables  against  the  continuous  promotion  rate 
variables,  the  discrete  variables  were  Jittered  to  overcome 
repeated  plotting  of  points.  Jittering  involves  generating 
small  random   increments,   which   are   then   added   to   the  X 
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values.  As  a  result,  when  the  X-Y  plot  is  performed  fewer  X 
values  are  repeatedly  plotted  in  the  same  location,  and  a 
better  visual  interpretation  can  be  made  of  the  quantity  of  X 
values  at  a  discrete  level. 

The  overall  results  of  the  LOWESS  plots  showed  that  the 
predominant  pattern  was  indeed  linear.  Further,  the  linear 
pattern  was  demonstrated  most  clearly  between  pairs  of  highly 
correlated  variables.  Figures  4.12  and  4.13  demonstrate  that 
linearity  and  the  LOWESS  and  Jittering  techniques 
respectively.  As  a  result,  linear  modelling  techniques  were 
considered  to  be  the  best  choice  for  subsequent  analysis. 
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LOWESS  PLCT  OF  PRA  V3  OAFQTP 
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Figure  4.12 


Figure  4.13 


b.   Simple  Regression 

For  pairs  of  significantly   correlated  variables, 
a   simple   least   squares   regression   plot   using  PRA  as  the 
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independent  variable  was  accomplished.  The  simple  least 
squares  regression  for  pairs  yields  quantitative  results  in 
terms  of  slope  values,  intercept  values,  tests  of  the  slope 
and  intercept  values,  and  the  R2  value. 

The  R2  value  represents  what  proportion  of  total  variance 
was  explained  by  the  simple  linear  model.  As  such,  its 
values  range  from  zero  to  one.  An  R2  value  of  zero  would 
indicate  that  a  linear  model  does  not  account  for  any 
variance  of  the  dependent  values.  Correspondingly,  a  value 
of  zero  would  be  the  estimate  of  the  slope  of  the  line.  The 
significance  of  R2,  like  rho,  is  related  to  sample  size.  To 
determine  the  significance  of  a  R2  value,  the  results  of  the 
T  test  for  the  slope  of  the  model  are  checked.  If  the  T 
statistic  is  large  and  the  probability  of  a  greater  T  value 
small,  a  null  hypothesis  of  a  slope  of  zero  is  strongly 
rejected.  Thus,  we  can  be  confident  of  the  linearity  of  the 
model  and  the  derived  slope  estimate.  Sample  size  is 
considered  in  this  test  because  the  T  statistic  is  computed 
as  a  function  of  sample  size.  Thus,  even  with  a  small  R2 
value,  if  the  T  test  for  the  slope  were  significant,  the  R2 
value  would  necessarily  be  held  as  significant.  The  only 
qualification  for  a  low  R2  value  would  be  that  there  exists 
considerable  'noise'  or  unaccounted  variance  in  the  response 
of  the  dependent  variable.  A  summary  of  results  are  shown  in 
Table  XII. 
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TABLE  XI] 

Simple  Least  Squares  Summary 

Data 

using  PRA 

as  Dependent  Varia 

ble 

Variabl 

e  Intercept   Std  Err 

Slope 

Std  Err 

R2 

I 

GTSCR 

-0.856 

(0.0061  ) 

0.008 

(5.6E-04) 

.013* 

13.8 

AFQTP 

-0.338 

(0.014   ) 

0.006 

(0.0002  ) 

.018* 

26.1 

OAFQTP 

-0.336 

(1.6E-02) 

0.007 

(3.2E-04) 

.033* 

22.5 

EIMCAT 

0.004 

(0.027   ) 

-0.003 

(0.005   ) 

.000 

-.5 

HIYRED 

-0.005 

(0.047   ) 

-0.001 

(0.008   ) 

.000 

-.2 

EDLVL 

0.011 

(0.054   ) 

-0.003 

(0.008   ) 

.000 

-  .02 

NCOE 

-0.020 

(0.021   ) 

0.003 

(0.003   ) 

.000 

1.1 

SEX 

0.011 

(0.028   ) 

-0.018 

(0.024   ) 

.000 

-  .7 

CMF 

-0.023 

(1.6E-02) 

0.000 

(2.6E-04) 

.000 

.9 

RACETH 

-0.009 

(0.018   ) 

-0.001 

(0.010   ) 

.000 

-  .  1 

PAYGD 

-0.045 

(0.093   ) 

0.007 

(0.018   ) 

.000 

.3 

PQSCR 

-0.059 

(5.4E-02) 

0.007 

(6.9E-04) 

.008* 

10.6 

Important  observations  from  the  simple  paired  regression 
analysis  are  summarized  in  the  following  paragraphs. 

Very  few  sets  of  pairs  result  in  a  significant  R2  value. 
Those  that  do  are:  GTSCR,  OAFQTP,  and  PQSCR.  All  three  of 
these  variables  have  a  positive  slope.  Analysis  of  residuals 
for  these  pairs  did  show  reasonable  normality  of  residuals 
and  did  not  demonstrate  any  lack  of  homoscedasticity . 

The  remaining  variables  have  a  low  value  positive  or 
negative  slope.  For  each  of  these  variables,  the  95% 
Confidence  Interval  for  the  slope  shows  the  upper  or  lower 
value  of  the  slope  to  be  either  positive  or  negative.  Thus, 
no  observable  ascending  or  descending  relationship  can  be 
claimed . 

Using  the  variable  RATE  as  the  independent  variable  in 
the  simple   regressions  results   in  the   variables  EIMCAT  and 
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AFQTP  having  measurable  R2  values  and  positive  slopes. 

As  expected,  the  results  of  the  simple  regression 
analysis  coincide  with  observations  taken  from  the 
correlation  table. 

When  considered  one  at  a  time,  there  appear  to  be  only  a 
handful  of  variables  demonstrating  a  reportable  relationship 
with  the  promotion  variables.  The  low  R2  value  for  each 
regression  indicates  either  a  large  proportion  of  pure  error, 
or  significant  unexplained  variance  due  to  other  explanatory 
variables  not  being  included. 

3 .   3-D  Empirical  Density  Plots 

Three  dimensional  empirical  density  plots  were  used 
to  visually  check  for  distribution  changes  in  the  continuous 
variables  within  the  subcategories  of  SEX,  PAYGD  and  RACETH. 
Two  such  plots  will  be  discussed  because  they  depict  visually 
data  characteristics  identified  in  earlier  tabular  results. 
These  characteristics  were:  the  application  of  AFQT 
restrictions  by  congressional  mandate  in  1980,  and  the 
differences  in  OAFQT  scores  across  racial  groups. 

The  AFQT  restriction  is  depicted  in  Figure  4.14,  where 
empirical  densities  for  OAFQT  are  plotted  for  each  paygrade. 
Observing  the  three  densities  shows  that  only  the  E-7 
paygrade  distribution  contains  scores  less  than  twenty.  This 
makes  sense,  considering  that  all  the  E-7  enlistments  were 
prior  to  1980.  Another  interesting  observation  from  this 
plot   is   that   high   OAFQT   scores   become   more  dominant  as 
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paygrade  increases.  This  is  most  apparent  in  comparing  the 
E-7  density  to  either  the  E-5  or  E-6.  This  shift  in  density 
of  OAFQT  across  the  three  paygrades  suggests  that  attrition 
tends  to  manifest  itself  in  the  lower  AFQT  caetgories,  but 
that  a  low  AFQT  score  is,  in  itself,  not  prohibitive  in 
achieving  senior  enlisted  rank. 

The  second  3-D  empirical  density  plot.  Figure  4.15,  shows 
the  differences  in  renormed  AFQT  scores  across  racial 
subcategories.  A  large  discrepancy  between  the  white  and  the 
distribution  of  black  or  hispanic  races  is  easily  seen, 
although  Indians  have  a  similar  AFQT  to  that  of  whites.  This 
observation  coincides  with  the  occurrence  of  different 
promotion  rates  between  different  racial  categories  as  well. 
However,  to  make  inferences  about  promotion  policy  among 
races  would  require  further  research.  As  pointed  out  by 
Daula,  tRef.  ll:pp.  7-10]  the  attrition  pattern  among 
different  racial  groups  shifts  the  averages  for  both 
promotion  rate  and  AFQT  among  the  races  over  time.  Since  the 
purpose  of  this  thesis  is  one  of  prediction,  it  is  more 
important  to  identify  the  effect  and  account  for  it  in  the 
model.  An  explanation  as  to  the  cause  of  this  phenomenon 
does  not  appear  to  be  easily  obtained  from  the  thesis  data. 

What  is  important  about  this  plot  is  that  it  visually 
demonstrates  the  correlation  between  RACETH  and  OAFQT.  If 
OAFQT  is  a  significant  determiner  of  promotion  rate,  then 
RACETH  will  be  an  important  covariate. 
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3-D  EMPIRICAL  DENSITY  PLOT 
OAFQT  BY  PAYGD 
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Figure    4 . 14 

3-D  EMPIRICAL  DENSITY  PLOT 
OAFQT  BY  RACETH 
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Figure  4.15 


D.   MULTIVARIATE  GRAPHICAL  ANALYSIS 

Multivariate  graphical  analysis  consisted  of  the  use  of 
Draftsman  Plots  and  Coded  Scatter  Plots  to  look  for 
relationships  when  more  than  two  dimensions  were  under 
consideration.  CRef .  12:pp.  135-139]  One  of  these 
procedures,   the   Coded   Scatterplot,   will   be   utilized   to 


72 


demonstrate  a  significant  data  characteristic,  that 
characteristic  being  the  distribution  of  SEX,  correspondent 
to  CMF  and  PRA,  in  Figure  4.16. 

Coded  Scatterplots  involved  delineating  one  of  the 
effects  variables  as  a  third  dimension,  while  plotting  an 
independent  variable  against  a  dependent  promotion  variable. 
In  Figure  4.16,  CMF  values  were  Jittered  and  plotted  against 
the  PRA  variable,  and  the  plot  points  were  coded  as  periods 
for  males  and  the  letter  F  for  females. 


CODED  SCATTERPLOT 
PRA  VS  CMF  WITH  SEX 


2 

a. 


1.1    .: 


-U *? — '''ft 


T 


r 


Fi- 


J L 


J I L 


20 


40 


eo 


80 


CMF 


Figure    4.16 

Figure  4.16  demonstrates  the  highex-  density  of  female 
personnel  in  the  upper  .  CMF  range,  which  contains  the  more 
technically  oriented  career  management  fields.  This 
corresponds  to  the  CMF-SEX  correlation  coefficient  of  0.258 
found  in  Table  X.  Likev/ise,  the  distribution  of  both  the 
female  and  male  PRA  scores  are  symmetric  about  the  zero  line. 
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This   corresponds    to   the    zero   value    for   the   PRA-SEX 
correlation  coefficient  also  found  in  Table  X. 

E.   LINEAR  MODELS 

1 .   Analysis  of  Variance 

One  Way  ANOVA  was  used  in  this  thesis  as  an 
intermediate  step  in  defining  a  final  inference  model . 
ANOVA's  usefulness  has  been  as  an  investigative  tool  to 
detect  differences  in  means  among  classes  of  explanatory 
variables.  For  example,  using  PRA  as  the  dependent  variable 
and  EIMCAT  as  the  independent  variable,  One-Way  ANOVA  will 
compare  and  test  the  equality  of  the  average  PRA  score  across 
the  eight  levels  of  EIMCAT,  i.e.,  mental  categories  one 
through  eight.  In  the  testing,  the  null  hypothesis  is  that 
all  eight  mental  category  PRA  means  are  equal,  while  the 
alternate  hypothesis  is  that  they  are  not.  The  test 
statistic  used  to  reject  or  accept  the  null  hypothesis  is  the 
F  statistic.  As  such,  a  large  F  value,  and  subsequent 
rejection  of  the  null  hypothesis  would  indicate  that  there 
exists  significant  differences  between  the  means  of  the 
promotion  scores  for  some  of  the  eight  mental  categories.  In 
general,  a  large  F  value  can  be  considered  to  be  any  computed 
F  statistic  greater  than  3,8,  the  asymptotic  95  percent  point 
for  a  one  degree  of  freedom  model.  The  nature  of  these 
differences  could  be  a  large  discrepancy  between  a  simple 
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pair  of  categories,  small  discrepancies  between  all  eight 
categories,  or  any  combination  of  difference  conditions. 
Thus,  ANOVA  has  limited  value  in  discerning  the  location  and 
magnitude  of  the  differences  between  category  means,  but  it 
does  identify  if  differences  exist  and  how  strong  those 
differences  are. 

Table  XIII  tabulates  a  twelve  by  three  matrix  of  results 
for  separate  One-Way  ANOVA 's.  The  rows  are  the  twelve 
explanatory  variables  and  the  columns  are  the  three  promotion 
variables.  Using  all  three  promotion  measures  as  the 
independent  variable  allowed  for  a  check  of  ANOVA  values  and 
trends  across  those  measures. 

In  addition  to  the  results  of  the  F  test,  a  value  of  R2 
is  reported.  This  R2  value  is  different  than  that  reported 
in  the  simple  linear  regression  model.  This  is  because  the 
ANOVA  procedure  considers  the  independent  variable  as  a  set 
of  levels,  rather  than  a  single  continuous  variable.  With 
One-Way  ANOVA,  all  variables  had  some  level  of  R2  reported. 
Further,  because  of  the  increased  informational  value  of 
variable  categories,  and  hence,  more  degrees  of  freedom  for 
computation,  the  values  of  R2  increased  above  the  simple 
regression  reported  values. 

It  should  be  noted  that  technically,  when  the  defined 
continuous  variables  were  put  into  ANOVA,  their  values  were 
grouped,  and  then  the  variables  were  treated  as  if  they  were 
discrete.     Because   the   SAS   software    and   computational 
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resources  used  could  handle  all  the  integer  values  for  the 
score  ranges  of  AFQTP  and  the  other  continuous  variables,  it 
was  possible  to  gain  insight  into  the  existence  of 
differences  between  individual  score  cells. 

Additionally,  nonparametric  procedures  were  used  to 
evaluate  the  relationships.  CRef.  13:pp.  250-2553  The 
nonparametric  ANOVAs  utilized  the  ranks  of  the  variables  and 
also  yielded  the  F  statistic  for  testing  the  hypothesis  of 
equal  level  means.  Having  agreement  between  the  parametric 
and  nonparametric  values  removed  the  need  of  having  to  pursue 
confirmation  of  assumptions  for  parametric  ANOVA.  It  will 
also  allow  analysis  of  results  to  focus  on  the  resultant 
values  of  F  and  R2  tabulated  in  Table  XIII. 


TABLE  XIII 

One-Way 

Anova  Summary 

Variable 

PRATE 

RATE 

PRA 

F 

R2 

F 

R2 

F 

R2 

SEX^ 

5.9 

.00016 

13.3 

.00351 

48.4 

.00128 

CMF» 

35. 

.02788 

93.3 

.07415 

0.0 

.00000 

RACETH 

90. 

.01177 

165.0 

.02133 

80.0 

.01049 

PAYGD' 

6292. 

.24953 

0.0 

.00000 

0.0 

.00000 

GTSCR 

18. 

.04250 

13.4 

.03184 

10.9 

.02636 

AFQTP 

32. 

.07046 

20.6 

.04623 

17.3 

.03908 

OAFQTP 

36. 

.08441 

25.3 

.06101 

19. 

.04657 

EIMCAT 

37. 

.01076 

71.5 

.02035 

96.9 

.02739 

HIYRED 

96. 

.02950 

106.0 

.03272 

117. 

.03590 

EDLVL 

37. 

.01076 

71.5 

.02035 

96.9 

.02739 

NCOE 

156. 

.05097 

76.4 

.02499 

46.8 

.01583 

PQSCR 

1.9 

.00375 

6.6 

.01341 

5.8 

.01181 

^The 

Pr>F  (1 

evel  of 

rejection  of  the 

null  hypothesis 

of  no  d 

ifference  in  means)  was 

.0145  for  PRATE, 

.0003  for 

RATE  an 

d  .0001 

for  PRA. 

2The 

Pr>F  for  PRA  is  1.0. 

3  The 

Pr>F  for  RATE  is  1.0,  and  for  PRA  is  1.0 

Values 

of  Pr>F 

for  the 

remainder  of  the 

table  were  .0001. 
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Review  of  the  Table  XIII  demonstrates  some  anticipated 
results,  which  are  summarized  in  the  following  paragraphs. 

Since  the  variables  PAYGD  and  CMF  were  controlled  for  in 
the  derivation  of  PRA,  there  is  correspondingly  no 
relationship  between  those  variables  and  the  PRA  promotion 
variable.  Likewise,  the  variable  PAYGD  was  controlled  for  in 
the  derivation  of  RATE,  and  there  was  no  linear  relationship 
demonstrated  for  that  pair.  The  zero  values  for  the  F 
statistic  and  R2  for  those  variable  combinations  documents 
this  fact. 

Using  RATE  or  PRA  as  the  dependent  variable,  and  allowing 
for  only  one,  most  significant  variable  to  be  selected  from 
each  of  the  intelligence  and  academic  groups,  results  in  the 
same  set  of  explanatory  variables  as  were  found  in 
correlation  analysis.  These  variables  were:  HIYRED,  OAFQTP, 
GTSCR,  PQSCR,  RACETH,  NCOE,  and  SEX.  The  most  significant 
variables  were  the  ones  which  had  the  larger  F  statistic,  and 
R2  value.  This  set  is  not  ordered,  however,  since  there  are 
differences  in  order  between  the  PRA  and  RATE  models. 

Another  interesting  development  from  ANOVA  results  when 
the  explanatory  variable  mean  and  variance  for  each  level  are 
plotted  against  the  promotion  variable.  This  not  a  standard 
analytical  plot,  but  it  does  provide  some  visual  information 
on  the  size,  direction,  and  dispersion  about  the  center  line 
of  an  independent  discrete  variable.  This  plot  is  most 
similar  to  a  strip  box  plot  for  continuous  variables. 
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An  example  plot  where  each  individual's  PRA  score  was 
plotted  against  the  sum  of  his  EIMCAT  and  HIYRED  score  is 
shown  in  Figure  4.17.  In  Figure  4.17  the  two  center  lines 
plotted  represent  the  sum  of  scores  for  EIMCAT  and  HIYRED 
seperated  between  the  GED  qualified  personnel  and  High  School 
Diploma  Qualified  personnel.  The  outside  two  lines  trace  the 
upper  and  lower  bounds  one  standard  deviation  from  the 
computed  means. 


X-Y  PLOT  OF  MEANS  AND  VARIANCES 

PRA  VS  HIYRED  +  EIMCAT 
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Figure  4 . 17 


By  plotting  a  separate  line  for  each  high  school  diploma 
category  it  can  be  seen  that  while  both  groups  have  a  similar 
increase  in  promotion  rate,  as  the  combined  level  of  EIMCAT 
and  HIYRED  increased,  the  GED  qualified  personnel  were 
consistently  a  fixed  level  lower  than  a  fully  qualified  high 
school   graduate.     Thus,   the  additional  merit  of  an  actual 
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high  school  diploma  did  manifest  itself  in  promotion   rate. 

A  final  look  at  ANOVA  involves  specifying  a  model  using 
the  set  of  the  seven  most  significant  independent  variables, 
and  then  checking  for  interactions  among  them.  Table  XIV 
gives  the  results  of  the  Seven-Way  ANOVA  using  this  model: 

RATE  =   7  Main  Effects  +  Two  Way  Interactions 

Table  XIV  depicts  the  seven  most  significant  variables 
individually  in  the  Main  Effects  rows,  and  the  interaction 
terms  in  the  Interactions  rows. 

The  advantage  of  this  Seven-Way  ANOVA  is  that  inclusion 
of  all  of  the  explanatory  variables  simultaneously  allows  for 
comparison  of  the  significance  of  each  of  the  explanatory 
variables  relative  to  the  others.  Additionally,  specifying 
combinations  of  two-way  interactions  checks  to  see  if  any  two 
of  the  explanatory  variables  are  significantly  related  to  one 
another.  An  example  of  an  interaction  would  be  a  SEX  and  CMP 
term.  As  has  been  previously  shown,  female  personnel  tend  to 
be  associated  with  higher  CMP  values.  If  the  ANOVA  model  for 
promotion  included  a  term  which  was  the  product  of  the  two 
values,  SEX*CMP,  then  the  two  attributes  would  be  jointly 
considered  in  the  ANOVA  model.  If  the  interaction  term  was 
found  to  be  significant,  then  the  two  individual  variables 
entries  for  CMP  and  SEX  would  be  removed  and  only  the 
interaction  term  retained. 

An  additional   consideration  in   the  Seven   Way  ANOVA  was 
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that  the  model  was  unbalanced.  Unbalanced  means  that  there 
were  some  combinations  of  the  factor  levels  which  did  not 
have  any  entries  in  the  ANOVA  cells.  An  example  of  this  can 
be  seen  in  the  SEX*OAFQT  term.  Specifically,  there  are  only 
76  degrees  of  freedom  for  the  interaction  term,  while  the 
individual  degrees  of  freedom  for  SEX  and  OAFQT  are  1  and  79 
respectively.  Thus,  the  SEX*OAFQT  term  had  three 
combinations  without  entries.  As  a  result,  the  F  statistic 
computed  will  be  only  approximate.  Since  the  purpose  of  this 
step  in  analysis  was  exploratory,  the  F  statistic  estimates 
were  considered  adequate. 

Table  XIV  presents  the  results  of  a  Seven  Way  ANOVA  using 
RATE  as  the  dependant  variable.  Similar  results  were 
obtained  using  PRA  as  the  dependant  variable. 
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TABLE  XIV   7-Way  Analysis  of  Variance  with  Interaction 
DEPENDENT  VARIABLE:  RATE 


SOURCE    DF    SSQ    MEAN  SQUARE 

MODEL    14966  18869.39  1.260818 

ERROR   22887   18981.65  0.829364 

CORRECTED 

TOTAL  37853   37851.04 


SOURCE 

Main  Effects 


DF 


RACETH 

5 

SEX 

1 

OAFQT 

79 

HIYRED 

12 

GTSCR 

93 

NCOE 

13 

PQSCR 

78 

Interactions 

RACETH*SEX  5 
SEX*OAFQT  76 
SEX*HIYRED  9 
SEX*GTSCR  72 
SEX*NCOE  11 
SEX*PQSCR  70 
RACETH*OAFQT  335 
RACETH*HIYRED  46 
RACETH*GTSCR  326 
RACETH*NCOE  46 
RACETH*PQSCR  288 
OAFQT*HIYRED  593 
OAFQT*GTSCR  2864 
OAFQT*NCOE  614 
OAFQT*PQSCR  3631 
HIYRED*GTSCR  564 
HIYRED*NCOE  88 
HIYRED*PQSCR  518 
GTSCR*NCOE  604 
GTSCR*PQSCR  3383 
NCOE*PQSCR     542 


ANOVA  SS 

807.35 

13.28 

1670.54 

1238.25 

1205.22 

945.89 

507.52 

0.00 

440.59 

66.03 

72.80 

57.76 

53.06 

0.00 

107.84 

0.00 

8.41 

104.24 

112.62 

2418.55 

954.24 

3182.33 

130.88 

276.98 

484. 13 

718.86 

2997.93 

504.44 


F  VALUE 
1.52 


PR  >  F    R2 
0.0001   0.49852 


ROOT  MSE 
0.91069421 

F  VALUE    PR 


194.69 

0 

.0001 

16.02 

0 

.0001 

25.50 

0 

.0001 

124.42 

0 

.0001 

15.63 

0 

.0001 

87.73 

0 

.0001 

7.85 

0 

.0001 

0.00 

1 

.0000 

6.99 

0 

.0001 

8.85 

0 

0001 

1  .22 

0 

0999 

6.33 

0 

0001 

0.91 

0 

6795 

0.00 

1 

0000 

2.83 

0 

0001 

0.00 

1. 

0000 

0.22 

1. 

0000 

0.44 

1. 

0000 

0.23 

1. 

0000 

1.02 

0. 

2570 

1  .87 

0. 

0001 

1.06 

0. 

0137 

0.28 

1 , 

0000 

3.80 

0. 

0001 

1.13 

0. 

0251 

1.44 

0. 

0001 

1  .07 

0. 

0051 

1.12 

0. 

0268 

Three  important  observations  can  be  obtained  from  Table 
XIV.  The  first  observation  is  that  there  are  few  significant 
interaction  terms.    Only  those  terms  marked  with  an  asterisk 
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demonstrated  statistical  significance  with  the  PR  >  F  at 
level  .0001.  Of  these,  only  three  had  F  values  greater  than 
3.8.  These  interaction  terms  were  OAFQTP,  HIYRED,  and  NCOE, 
all  interacting  with  SEX.  The  presence  of  interation  seen  in 
the  Seven-Way  ANOVA  model  was  previously  observed  in  the 
correlation  matrix.  Table  X,  where  SEX  was  positively 
correlated  with  HIYRED  and  OAFQTP,  (0.05,  and  0.131 
respectively),  and  negatively  correlated  with  NCOE,  (-0.081). 
The  implication  of  having  significant  interaction  terms  is 
that  they  would  need  to  be  included  in  any  predictive  model. 
Thus,  identification  of  interactions  using  ANOVA  was 
critical . 

Secondly,  all  the  main  effects  variables  continue  to  be 
significant,  even  when  used  simultaneously  by  the  model. 

Lastly,  selecting  the  single  most  significant  explanatory 
variable  from  the  academic  and  education  groups  yields  the 
same  unordered  best  set  as  did  the  One-Way  ANOVA:  OAFQTP, 
HIYRED,  GTSCR,  NCOE,  RACETH,  and  SEX. 

In  summary,  the  fundamental  result  of  ANOVA  was  the 
confirmation  that  there  are  differences  in  the  level  means  of 
promotion  scores  due  to  several  independent  explanatory 
variables,  and  an  agreement  as  to  which  were  the  best 
explanatory  variables  when  considered  separately  or 
simultaneously . 

Also,  plotting  the  means  and  variances  of  the  sum  of 
EIMCAT  and   HIYRED  versus   PRA  demonstrated   that  there  was  a 
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good  increasing  linear  trend  of  the  level  means  with  PRA. 
However,  there  was  considerable  variance  within  each  class 
level.  The  choice  of  EIMCAT  and  HIYRED  as  the  explanatory 
variables  was  important  because  those  variables  are  both 
discrete  representatives  from  the  academic  aptitude  and 
education  groups. 
2.   ANCOVA 

The  use  of  One-Way  Analysis  of  Variance  in  the 
previous  section  was  primarily  to  confirm  the  existence  of 
significant  differences  among  the  levels  of  the  independent 
variables.  Beyond  acknowledging  that  there  are  some 
independent  variables  available  to  explain  promotion  rates, 
Seven-Way  ANOVA  did  not  provide  any  numerical  measure  of  the 
structural  form  of  the  contribution  of  a  given  independent 
variable  to  the  model.  [Ref.  14:p.  10]  In  addition,  in 
analysis  of  the  continuous  variables,  the  nature  of  the 
variable  was  changed  to  represent  a  discrete  valued  variable. 
Incorporating  continuous  variables  into  ANOVA  was 
achieved  through  the  intermediate  method  of  ANCOVA.  ANCOVA 
utilizes  metric  continuous  variables  as  well  as  nonmetric 
qualitative  values.  The  result  of  ANCOVA  was  an  improved 
multivariate  model  with  the  inclusion  of  continuous  variables 
in  their  proper  form.  ANCOVA  provided  estimates  of  the 
linear  coefficients  for  the  continuous  variables,  and 
reported  on  the  px-oportion  of  variance  accounted  for  by  each 
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categorical  variable  as  well.  These  results  provided  the 
basis  for  further  removal  of  variables  or  interactions  from 
the  set  previously  identified.   [Ref.  15:  pp.   343-349] 

The   model   considered   was   based   on  the  results  of  the 
previous  chapters  and  consisted  of  the  following  form: 
Promotion  =  f ( OAFQTP, PQSCR , GTSCR , HI YRED, NCOE, RACETH, SEX 
plus  interaction  terms   SEX*HIYRED,  SEX*GTSCR,  SEX*OAFQTP) 

The  variables  OAFQT,  PQSCR,  and  GTSCR  are  metric  and 
continuous,  HIYRED  and  NCOE  are  discrete  and  metric,  and 
RACETH  and  SEX  are  discrete  and  nonmetric. 

A  representation  of  the  model  using  notation  consisted  of 
the  following  form: 

Yi  =  Bo  +  BiXi  +82X2  +  BsXs  +  D^  +  D2  +  .  .  .  D4  +  Ii   ...  Is 

In  the  above  notation,  Yi  is  the  promotion  variable  PRA, 
Bo  is  the  linear  intercept,  and  Bx  through  Bs  are 
coefficients  for  the  continuous  variables  OAFQT,  GTSCR  and 
PQSCR.  The  coefficients  Bi  through  Bs  are  assumed  to  be  the 
same  for  all  levels  of  the  other  variables.  Di  through  D« 
represent  the  discrete  variables  RACETH,  SEX,  HIYRED,  and 
NCOE.  Ii  through  I3  are  the  interaction  terms  OAFQT*SEX, 
HIYRED*SEX,  and  NCOE*SEX. 

This  model  is  also  unbalanced  and  the  F  statistics  are 
estimates.  The  results  of  the  ANCOVA  using  this  model  are 
shown  in  Table  XV. 
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TABLE  XV 


ANCOVA  with  Interactions 


DEPENDENT  VARIABLE:  PRA 

SOURCE    DF   SSQ     MEAN  SQUARE  F  VALUE     PR  >  F     R2 

MODEL    55   2423.68    44.07  47.13      0.0001    0.0642 

ERROR  37798  35339.29    0.934  ROOT  MSE 

CORR   37853  37762.98  0.966 

TOTAL 


SOURCE 

Main  Effects 


DF 


TYPE  III  SS 


OAFQT 

1 

12.89440024 

RACETH 

5 

152.10095609 

SEX 

1 

5.31950192 

HIYRED 

12 

517.91751116 

GTSCR 

1 

3.65772995 

NCOE 

13 

132.83314221 

PQSCR 

1 

80.15632971 

Interactions 

OAFQT*SEX  1 
SEX*HIYRED  9 
SEX*NCOE      11 


PARAMETER 

INTERCEPT 

OAFQT 

GTSCR 

PQSCR 


ESTIMATE 
0.25501 
0.00094 
-0.00104897 
0.00422902 


4.03387863 
10.16825209 
18.42527136 

T  FOR  HO: 
PARAMETER=0 

0.31 

1.26 
-1.98 

9.26 


F  VALUE 

PR  >  F 

13.79 

0.0002 

32.54 

0.0001 

5.69 

0.0171 

46.16 

0.0001 

3.91 

0.0479 

10.93 

0.0001 

85.73 

0.0001 

4.31 

0.0378 

1.21 

0.2844 

1.79 

0.0496 

PR  >  IT 

0.7592 
0.2077 
0.0479 
0.0001 


STD  ERROR  OF 

ESTIMATE 

0.83191986 

0.00074544 

0.00053034 

0.00045674 


There  are  three  important  observations  from  Table  XV. 
First,  the  main  effects  variables,  with  the  exception  of 
GTSCR,  are  still  significant  in  their  ability  to  account  for 
variance  in  the  model. 

Secondly,  no  interaction  terms  are  significant.  The  PR  > 
F  for  these  terms  are  much  greater  than  .0001  and  each  has  a 
small  F  value.  Thus,  the  effect  of  the  interaction  terms 
will  be  assumed  to  be  negligable. 

Lastly,  the  bottom  portion  of  the  ANCOVA  table  lists 
estimates   of   regression   coefficients   for   the   continuous 
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variables.  These  estimates  were  tested,  using  the  T 
statistic,  to  see  if  they  were  significantly  different  from  a 
hypothesized  value  of  zero.  If  the  estimate  was  not 
significantly  different  from  zero,  then  the  explanatory 
variable  did  possess  sufficient  predictive  ability. 

The  PQSCR  coefficient  has  a  small,  but  positive  slope 
with  a  value  of  0.0042,  and  is  significantly  different  from 
zero.  The  OAFQT  variable  has  a  slope  with  the  correct  sign 
and  magnitude,  but  it  is  not  significantly  different  from 
zero.  The  GTSCR  variable  demonstrates  a  negative  slope  and 
again  is  not  significantly  different  from  zero. 

The  negative  estimate  value,  combined  with  the  knowledge 
that  GTSCR  is  strongly  correlated  with  OAFQT,  indicated  a 
condition  of  multicollinearity  between  the  two  variables. 
Multicollinearity  implies  that  one  variable  may  be  simply  a 
surrogate  for  the  other  with  little  or  no  effect  as  a 
predictor . [Ref.  15:p.  4151  Thus,  the  inclusion  of  GTSCR 
coincident  to  OAFQT  was  considered  detrimental  to  the 
development  of  a  regression  model,  and  it  was  dropped  from 
subsequent  analysis. 

In  summary,  ANCOVA  resulted  in  the  elimination  of  the 
remaining  interaction  terms  from  consideration  in  the 
predictive  model.  The  estimated  values  of  OAFQT  and  GTSCR 
demonstrated  a  condition  of  multicollinearity  in  the  model, 
and  the  weaker  variable,  GTSCR,  was  eliminated.  The 
remaining  variables  to  be   considered  in   subsequent  analysis 
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were:  OAFQT,  PQSCR,  HIYRED,  NCOE,  RACETH,  and  SEX.  These 
results  were  considered  satisfactory,  in  that  the  remaining 
variable  set  contains  single  measures  of  academic  aptitude, 
education,  professional  education,  military  performance 
testing,  as  well  as  two  categorical  variables:  SEX  and 
RACETH. 

3.   The  Final  Model;   A  Multiple  Regression  (ANCOVA) 
a.   Background 

Regression  analysis  with  a  reduced  set  of 
variables  was  the  final  step  in  successive  data  analyses. 
The  important  result  of  this  analysis  was  a  set  of 
coefficient  values  which  estimated  qualitative  numerical 
statements  about  the  independent  influence  of  each  of  the 
explanatory  variables.  Of  specific  importance  was  the 
independent  influence  of  OAFQT  and  HIYRED  in  predicting  an 
individual  promotion  rate. 

In  the   development  of   the  regression  model  this  section 
will: 

1 .  Review  the  pertinent  results  which  led  to  the 
regression  model  definition. 

2.  Compare  the  model  using  the  three  promotion  rate 
variables . 

3.  Select  a  single  promotion  variable  for  the  model. 

4.  Interpret  the  resulting  regression  estimates  and 
conduct  sensitivity  analysis. 

5.  Check  model  assumptions  and  confirm  the  model  using 
an  alternate  data  set  and  nonparametric  procedures. 

6.  Test  the  model  by  comparing  actual  versus  predicted 
promotion  rates  for  population  subcategories. 
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Previous  results  are  reviewed  in  the  following  paragraphs. 

ANOVA  and  ANCOVA  demonstrated  that  significant 
differences  exist  between  internal  levels  of  the  explanatory 
variables  as  a  function  of  average  promotion  rates. 

Paired  scatterplots  utilizing  smoothing  techniques,  and 
plots  of  the  level  means  found  in  ANOVA,  consistently 
demonstrated  an  ascending  linear  pattern  when  plotted  against 
promotion  variables. 

ANOVA  and  ANCOVA  models,  using  interactions,  resulted  in 
the  elimination  of  variables  which  did  not  demonstrate 
sufficient  linear  additive  effect  to  be  included  in  the 
model.  Further,  this  analysis  confirmed  that  there  was  no 
significant  interaction  among  the  remaining  variables. 
Correlation  analysis,  combined  with  the  in-depth  univariate 
analysis  as  to  the  nature  and  scoring  procedures  of  the 
individual  variables,  identified  groups  of  variables.  In 
subsequent  analysis,  these  groups  were  then  restricted  to 
allow  for  only  the  strongest  unique  variable  to  be  entered 
into  the  model. 

The  final  set  of  variables  for  entry  into  the  model  are 
the  following: 

Promotion  =  f (OAFQT, PQSCR, HI YRED, NCOE, RACETH, SEX) 
This   model   is   a   mixed   scale   and   variable   type   model, 
including  both  discrete  and  continuous  variables.   Two  of  the 
input  variables  have  nominal  scale,  RACETH  and  SEX.   To  allow 
for  their  entry  into  the  model,  these  values  were  transformed 
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into  dummy  variables.  Specifically,  the  variable  SEX  was 
receded  as  a  0/1  variable,  while  RACETH  was  represented  with 
five  dummy  0/1  variables:  Dl  through  D5 .  For  example,  for 
the  RACETH  score  of  1,  the  dummy  variable  Dl  was  coded  with  a 
1  for  every  1  entry  and  a  zero  for  all  others.  This 
procedure  was  applied  for  the  next  four  levels,  while  score  6 
was  left  as  a  0/0  entry.  [Ref.  15:pp.   332-341] 

After   application   of   the  receding  just  described,  the 
regression  model  can  be  defined  with  the  notation: 

Yi  =  Bo  +  Bi  Xi   *  B2X2  +  B3X3  +  B«X4  +  Dl  +  ...  +  Ds  +  D» 

In  the  above  notation,  Yi  is  one  of  the  promotion 
variables.  Bo  is  the  linear  intercept,  and  Bi  and  Ba  are 
coefficients  for  the  continuous  variables  OAFQT,  and  PQSCR. 
Bs  and  B4  are  coefficients  for  the  discrete  and  ordinal 
variables  HIYRED  and  NCOE.  Di  through  Ds  represent  the  dummy 
variables  for  RACETH,  and  De  represents  the  dummy  variable 
for  SEX. 

The  data  set  of  37,854  records  was  randomly  split  into 
two  separate  data  files  for  regression  analysis.  This 
provided  for  a  different  data  set  to  confirm  analysis  of 
regression  coefficients  from  the  first  set.  Paragraph  e.l. 
of  this  section  compares  resulting  regression  coefficients  of 
the  model  using  the  second  data  set. 
b.   Results 

Table   XVI   lists   the   regression  results  of  the 
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basic  model  variables.  When  computing  models  for  PRATE  and 
RATE  the  effects  variables  CMP  and  then  CMF  and  PAYGD  were 
reintroduced  into  the  set  of  explanatory  variables 
respectively.  This  allowed  for  comparison  of  variable 
coefficients  and  R2  value  changes  as  the  dependent  variable 
became  more  restricted.  In  Table  XVI  the  top  paragraph  shows 
the  ANOVA  results  of  the  model  and  reports  the  F  and  R2 
statistic.  Each  column  then  gives  the  regression  results  of 
each  promotion  rate  model,  including  a  Pr>T  value  as  measure 
of  the  strength  of  rejection  for  a  null  hypothesis  of  zero 
for  the  estimate  value.  Values  of  Pr>T  less  than  .05  are 
considered  acceptable  for  consideration  of  that  variable. 
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TABLE  XVI   Regression  Results 

PRATE  RATE  PRA 


Added  Variables 

CMF,  PAYGD 

CMF 

None 

ANOVA  F 

1317.4 

360.3 

218.5 

Pr>F 

.0001 

.0001 

.0001 

R2 

.3116 

.0948 

.0546 

Intercept 

0.022222 

-1.03692 

-1 

.28822 

(std  error) 

( .002558) 

( .055368) 

.05600) 

Pr>T 

.0001 

,0001 

.0001 

OAFQT 

.0001355 

.0058817 

.0042608 

(std  error) 

(00000871) 

(  .0002444) 

.0002492) 

Pr>T 

.0001 

.0001 

.0001 

HIYRED 

.0005341 

.148352 

.139484 

(std  error) 

( .000152) 

( .004851) 

.0049298) 

Pr>T 

.0001 

.0001 

.0001 

PQSCR 

.000089 

.001608 

.00327211 

(std  error) 

( .000014) 

( .000449) 

.0004583) 

Pr>T 

.0001 

.0001 

.0001 

SEX 

- .0008582 

.022904 

.0564079 

(std  error) 

( .00050325) 

( .01562) 

.0155310) 

Pr>T 

.088* 

. 1427* 

.0003 

NCOE 

.00008839 

.012688 

.0073740 

(std  error) 

( .00000625) 

( .0017808) 

.0017949) 

Pr>T 

. 1573* 

,0001 

.0001 

Dl  (RACETH) 

.0026347 

.053088 

.01497054 

(std  error) 

( .0011286) 

( .035653) 

.0363905) 

Pr>T 

.0196 

.1365* 

.6808* 

D2  (RACETH) 

-  .0037888 

-.096320 

-0 

.0898693 

(std  error) 

( .0011266) 

( .035570) 

.0363089) 

Pr>T 

.0008 

.0068 

.0013 

D3  (RACETH) 

-  .0009404 

-.0239592 

- 

.0417668 

(std  error) 

( .001279) 

( .040383) 

.04122033) 

Pr>T 

.4623* 

.5530* 

.3109* 

D4  (RACETH) 

.00028892 

.089059 

.01007473 

(std  error) 

(  .0032534) 

( .102707) 

. 1048355) 

Pr>T 

.3745* 

.3859* 

.9234* 

D5  (RACETH) 

-.000224 

-.021530 

- 

.0138649 

(std  error) 

( .0018127) 

( .0572261) 

.058409) 

Pr>T 

.9016* 

.7067* 

.8124* 

CMF 

-.000147 

-.0053672 

NA 

(std  error) 

( .0000052) 

( .0001654) 

Pr>T 

.0001 

.0001 

D7  (PAYGD) 

.060127 

NA 

NA 

(Std  error) 

( .0017904) 

Pr>T 

.0001 

D8  (PAYGD) 

.017999 

NA 

NA 

(std  error) 

( .001774) 

Pr>T 

.0001 
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Observations  from   the  regression   table  are  summarized  in 
the  following  paragraphs. 

The  input  variables  OAFQT,  HIYRED,  and  PQSCR  all 
maintained  a  positive  and  statistically  significant 
coefficient  value  across  all  three  dependent  variables. 

The  inclusion  of  PAYGD  with  the  PRATE  variable 
significantly  increased  the  R2  value  of  the  model. 
Conversely,  the  influence  of  OAFQT,  HIYRED,  PQSCR,  and  the 
other  explanatory  variables  was  severely  diminished. 

The  RATE  model  is  very  similar  to  the  PRA  model,  and  has 
generally  larger  estimate  values  and  a  higher  R2 .  However, 
the  estimates  for  RACETH  and  SEX  did  not  have  significant  T 
values . 

The  PRA  model,  although  having  a  lower  R2  value  and 
generally  smaller  estimate  values,  had  an  acceptable  T  test 
result  for  SEX.  Additionally,  the  PRA  model  contained  one 
less  nominal  explanatory  variable,  CMF,  The  PRA  model  then, 
has  fewer,  and  more  reliable  nominal  explanatory  variables. 
Since  the  objective  of  the  study  was  to  focus  on  academic  and 
educational  measures  as  predictors  of  promotion,  the  PRA 
model  was  chosen  as  the  most  effective  predictive  model. 
Subsequent  analysis  of  regression  coefficient  results  were 
conducted  with  the  PRA  model, 
c.   Interpretation 

Interpretation   of   the   regression   coefficients 
will  include   two  points.    First,   the  explanatory  variables 
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which  can  effect  the  greatest  change  in  the  dependent 
variable  will  be  identified.  Secondly,  an  example  will 
demonstrate  the  amount  of  change  in  a  given  explanatory 
variable  required  to  achieve  a  five  percent  shift  in  the  PRA 
estimate . 

The  amount  of  change  in  PRA  caused  by  a  change  of  one  unit 
of  an  explanatory  variable  can  be  read  directly  from  the 
regression  coefficients.  However,  the  total  amount  of  change 
that  an  explanatory  variable  can  cause  in  PRA  depends  on  the 
range  of  the  explanatory  variable.  Table  XVII  gives  an 
ordered  listing  of  the  explanatory  variables,  excluding 
categorical  variables,  from  most  to  least  total  influence  as 
measured  by  Net  Possible  Change.  The  net  possible  change  is 
simply  the  number  of  units  in  the  range  of  the  explanatory 
variable  multiplied  by  the  coefficient  estimate. 


TABLE  XVII 

Ne 

t  Possi 

ble 

Change  by  Explanatory  Variable 

Variable 

Range 

Estimate    Net  Possible  Change 

HIYRED 

1-12 

.13948378         1.6738 

OAFQT 

1-99 

.00426083         0.4218 

PQSCR 

21-100 

.00327212        0.2585 

NCOE 

0-14 

.00737408         0.1106 

In  a  qualitative  sense,  the  sensitivity  of  PRA  to  each 
explanatory  variable  can  be  demonstrated  by  deriving  the 
number  of  explanatory  variable  units  needed  to  move  from  the 
median  PRA  value  up  five  percent. 

To  compute  the   average   value   for   PRA,   the  population 
average  for   each  explanatory   variable  was   entered  into  the 
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regression  model.  The  resulting  PRA  value  was  0.0185,  which, 
using  the  normal  approximation,  lies  at  the  50.7  percentile 
of  the  PRA  distribution.  An  upward  shift  of  5  percent  would 
then  require  the  PRA  value  to  lie  at  the  55.7  percentile. 
Using  the  standard  normal  tables  to  approximate  the  PRA 
distribution,  the  PRA  value  corresponding  to  its  55.7 
percentile  was  0.1434.  Checking  the  sensitivity  of  each 
explanatory  variable  consisted  of  changing  a  single 
explanatory  variable  a  sufficient  number  of  units  to  result 
in  a  PRA  value  of  0.1434,  while  holding  all  other  explanatory 
variables  at  the  population  average.  Table  XVIII  tabulates 
the  increase  of  explanatory  variable  units  necessary  to 
produce  a  5  percent  upward  shift  in  PRA  percentile. 
Alternatively,  if  the  amount  required  to  reach  the  55.7 
percentile  was  not  possible  within  the  range  of  the  input 
variable,  the  maximum  amount  of  available  change  was  listed. 


TABLE  XVIII 

S 

ensitivity  of 

PRA  to  Expl 

anatory  Variables 

Variable 

A 

verage  Value 

Chanqe  to 

Pra  %  Chanqe 

HIYRED 

6.01 

7.0 

55.9 

OAFQT 

45.3 

74.0 

55.7 

NCOE 

3.06 

14.0* 

54.0 

PQSCR 

78.4 

99.0* 

53.4 

*max  value 

Interpretation  of  the  coefficient  values  clearly 
demonstates  that  HIYRED  is  the  most  important  explanatory 
variable.  This  observation  is  understandable  since  the 
structure  of  the  variable   is  discrete,   and  that   changes  to 
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adjacent  values  represents  major  distinctions  in  educational 
background.  The  example  of  shifting  from  a  value  of  six  to  a 
value  of  seven,  represents  the  difference  of  having  a  high 
school  degree  versus  having  gone  to  one  year  of  college.  In 
percentages  of  HIYRED,  that  constitutes  moving  from  a  large 
center  group  of  high  school  qualified  NCO's,  to  the  upper 
ninety  percent  of  the  HIYRED  distribution. 

OAFQT  is  the  second  most  significant  explanatory  variable. 
A  shift  of  roughly  one  quarter  of  its  range,  i.e.  45  to  75, 
can  change  PRA  plus  or  minus  five  percent.  The  other 
explanatory  variables  NCOE  and  PQSCR  have  considerably  less 
influence  on  the  dependent  variable, 
d.   Checking  of  Assumptions 

To  verify  the  requirements  for  the  regression 
model,  residual  analylsis  was  performed  using  the  Grafstat 
program.  Representative  plots  of  the  OAFQT  residual  are 
shown  in  Figures  4.18  and  4.19. 


REGRESSION  REDISUAL  HISTOGRAM 


REGRESSION  RESIDUAL  SCATTER  PLOT 
(N=5C0) 
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Figure       4.18 


Figure    4 . 19 
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The  histogram  of  residuals,  shown  in  Figure  4.18, 
demonstrates  that  the  residual  distribution  is  approximately 
normal.  Homoscedasticity  is  checked  in  Figure  4.19,  in  which 
residuals  have  been  plotted  against  the  OAFQT  variable. 
There  does  not  appear  to  be  any  patterns  in  the  plots  of  the 
residuals,  and  the  uniform  pattern  was  considered  sufficient 
to  justify  the  assumption  of  homoscedasticity.  Lastly,  since 
each  observation  represents  a  different  person,  the 
independence  of  each  observation  from  one  another  is  assumed 
true. 

e.   Confirmation  of  Regression  Findings 

(1)  Second  Data  Set .  Regression  analysis  was 
conducted  on  the  second  partition  of  the  data  set.  A 
comparison  of  those  results  with  the  first  data  set  is  shown 
in  Table  XIX. 


TABLE  XIX    Comparison  of 

Regression  Data  Sets 

Independent  Variable 

PRA 

1st  Set 

2nd  Set 

Coeff      Std  Err 

Coeff        Std  Err 

Estimator 

OAFQT      .004260   (.00025) 

.004729      (.00032) 

HIYRED     .139483   (.00493) 

.131559      (.00636) 

PQSCR      .003272   (.00046) 

.003197      (.00060) 

The  above  results  are   felt  to   be  sufficiently  comparable 
to  accept  the  original  model  coefficient  scores. 

(2)  Nonparametric  Regression.  Since  the  model 
contained  an  ordinal  variable,  HIYRED,  a  regression  result 
using   nonparametric   terms   was   included   as  a  confirmatory 
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measure.  Nonparametric  regression  produced  the  same  linear 
least  squares  approximation  for  the  model  estimates,  so  the 
regression  coefficient  for  HIYRED  was  still  0.1395.  However, 
for  nonparametric  regression  the  test  for  the  acceptance  of 
the  estimate  value  used  the  Spearman  rank  correlation 
coefficient.  The  regression  coefficient  for  HIYRED  was 
tested  using  this  procedure. 

First,  for  each  value  of  PRA  and  HIYRED  a  predicted  value 
U  was  found  by  computing  U  =  PRA  -  (0.1395  *  HIYRED).  Then, 
the  Spearman  rank  correlation  coefficient,  rho,  was  computed, 
based  on  the  ranks  of  HIYRED  and  the  ranks  of  U.  It  was 
found  to  be  0.02482  with  a  Pr> I R I  of  0.0001.  In  this  test 
the  null  hypothesis  was  the  value  of  the  regression 
coefficient  was  equal  to  0.1395,  the  value  found  in 
regression.  [Ref.  13:pp.  265-271]  To  test  the  null 
hypothesis,  that  the  regression  coefficient  estimate  is 
correct,  rho  was  compared  against  a  rejection  region  computed 
using  the  two  tailed  Spearman  Quantile,  with  a  normal 
approximation.  The  rejection  regions  for  this  Spearman 
Correlation  parameter  were  values  less  than  0.0085  or  greater 
than  0.9915.  Since  the  value  of  rho  did  not  fall  inside 
either  rejection  region,  the  null  hypothesis  could  not  be 
rejected,  and  a  HIYRED  regression  coefficient  of  .1395  was 
acceptable . 
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f.   Testing  the  Model 

The  mocfel  coefficients  founc3  by  regression  were 
tested  in  two  ways.  First,  a  predicted  promotion  rate  value 
was  computed  for  the  extremes  and  average  of  the  model .  The 
extreme  values  used  the  minimum  or  maximum  values  for  the 
input  variables.  The  average  promotion  rate  was  computed 
using  sample  averages  for  all  input  variables.  The  resulting 
predictions  were  then  be  compared  against  the  actual 
distribution  percentiles. 

Secondly,  subsets  of  the  sample  population  had  average 
promotion  rates  predicted  using  categorical  values  and  sample 
population  averages.  The  resulting  predictions  are  compared 
against  the  actual  sample  values.  Again  percentile  values 
for  PRA  were  found  by  using  a  standard  normal  table 
approximation . 


TABLE  XX   Comparison  of  Extreme 

and  Average  Predictions 

Model 
Minimum  Prediction 

Data 
Sample  Percentile 

PRA  Value      Percentile 
-1.0009           15.7% 
(.1000)          (3.5%) 

PRA  Value 
-1.558 

Percentile 
5% 

Maximum  Prediction 

Sample 

Percentile 

PRA  Value      Percentile 
1.23029           89.1% 
(.4098)           (9.9%) 

PRA  Value 
1.7866 

Percentile 
95% 

Average  Prediction 

Sample 

Percentile 

PRA  Value      Percentile 
0.01839         50.7% 
(0.223)          (8.5%) 

PRA  Value 
-0.04146 

Percentile 
50% 
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The  model  predictions  were  very  accurate  at  the  average 
level,  but  this  accuracy  diminished  at  the  extremes. 

The  second  test  for  the  model  was  one  where  specific 
population  subcategories  had  their  average  PRA  value 
predicted.  The  subcategories  represented  were  four 
combinations  of  SEX  and  the  black  and  white  RACETH  variables . 
Additionally,  predictions  were  made  to  check  the  average 
promotion  rate  of  all  NCO's  with  a  HIYRED  value  of  10,  and 
all  NCO's  with  an  OAFQT  of  85.  As  in  the  previous  table, 
unless  the  input  variable  is  being  used  as  a  subcategory,  its 
value  was  set  to  the  overall  population  average.  Table  XXI 
shows  the  results  of  the  predictions. 


TABLE  XXI   Comparison  of  Predicted 

vs  Actual 

PRA  Averages 

Subcategory 

Predicted  % 

Sample 

% 

Sample  Size 

(Lower-Upper) 

Male/White 

55.1 
(45.7-64.2) 

53.1 

18,003 

Male/Black 

49.5 
(40.3-58.9) 

44.3 

12, 121 

Female/Black 

47.  3 
(37.7-56. 1) 

47.7 

2,485 

Female/White 

52.9 
(44. 1-61 .5) 

59.5 

1,842 

HIYRED=10 

71  .7 
(63.5-79.3) 

75.7 

969 

0AFQT=85* 

57.4 
(44.7-69.4) 

60.2 

2129 

*The  sample  da 

ta 

point  estimate 

was  averag< 

3d    over  a 

range  of  OAFQT 

80  to  90. 

99 


Testing  of  the  regression  model  indicates  that  it  was 
reasonably  effective  if  used  with  input  changes  of  the 
nominal  variables,  such  as  SEX  and  RACETH.  Changes  in  the 
value  of  HIYRED  produces  reliable  estimates,  and  demonstrated 
the  considerable  contribution  of  this  variable  as  a  predictor 
of  PRA.  The  continuous  variable  OAFQT  is  difficult  to  test; 
since  it  is  a  continuous  variable  the  model  estimate  was 
taken  over  a  range  of  values.  Predicted  results  are  close  to 
the  sample  value,  but  the  variance  of  the  estimate  still 
spans  the  median.  OAFQT  does  move  the  predicted  values  of 
PRA  in  the  right  direction,  but  its  effectiveness  is  severely 
hampered  by  its  variance  and  diminishing  ability  to  provide 
an  accurate  prediction  value  as  PRA  approaches  either 
extreme.  Other  prediction  estimates  were  attempted  using 
OAFQT  and  their  results  demonstrated  the  same  lack  of 
predictive  ability  away  from  the  center  percentiles, 
g.   Summary  of  Regression  Analysis 

Regression  analysis  provided  estimates  of  the 
independent  contribution  of  several  key  variables  to 
predicting  a  promotion  rate.  They  include  a  measure  of 
intellgence  aptitude,  OAFQTP,  a  measure  of  academic  ability, 
HIYRED,  two  measures  of  military  performance,  PQSCR  and  NCOE, 
and  two  nominal  values  SEX  and  RACETH. 

Testing  of  these  estimates  shows  that  the  predictive 
ability  of  the  model  is  limited  to  those  variables  which  have 
very    distinct    abilities    to    subcategorize   the   sample 
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population.  These  variables  are  the  SEX,  RACETH,  and  HIYRED 
variables.  The  continuous  variables  for  OAFQT,  PQSCR,  cannot 
be  relied  upon  to  independently  yield  estimates  of  PRA,  but 
can  affect  limited  shifts  of  the  PRA  distribution  within  a 
subcategory . 

E.   SUMMARY  OF  FINDINGS 

Chapter  IV  was  the  principal  analytical  exercise  in  this 
study.  It  progressed  through  ascending  stages  of  analysis 
and  resulted  in  an  inferential  model  with  a  restricted  and 
independent  set  of  explanatory  variables.  These  explanatory 
variables  did,  in  fact,  rely  on  levels  of  intellegence  tests 
and  academic  background  as  values  to  predict  promotion. 

The  model,  however,  demonstrated  only  limited  utility  as  a 
preditive  equation.  It  could  only  match  the  sample  data  when 
it  was  describing  an  average  promotion  rate  among  a  large 
population  subcategory.  This  would  occur  only  where  the 
change  in  the  explanatory  variable  had  a  significant 
partitioning  effect  on  the  population. 

The  next  two  chapters  will  investigate  the  relationship  of 
intelligence  and  academic  ability  as  a  predictor  of  promotion 
rate  but  through  different  procedures. 
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V.   ANALYSIS  OF  TOP  PERFORMERS 

A.   INTRODUCTION 

This  chapter  took  an  ad  hoc  approach  to  identify  any 
trends  which  distinguish  top  performers^  on  the  basis  of 
promotion  rate,  from  their  peers.  Top  performers  consist  of 
the  top  three  percent  of  the  population,  or  1,047 
individuals,  according  to  PRA  scores.  This  data  set  was 
referred  to  as  the  TOP  data  set,  while  the  remainder  were 
referred  to  as  the  SAMPLE  data  set. 

Analysis  consists  of  three  sections.  The  first  section 
is  a  comparative  tabulation  of  means  and  variances.  Results 
shown  in  this  section  confirmed  the  majority  of  sample 
characteristics  predicted  in  Chapter  IV.,  such  as  higher 
EIMCAT  and  OAFQT  scores.  There  were,  however,  discrepancies 
with  respect  to  TOP  distribution  values  of  RACETH,  NCOE  and 
PAYGD.  Those  discrepancies  are  investigated  in  later 
sections  of  this  chapter.  The  second  section  reports  the 
results  of  formal  hypothesis  testing  for  differences  in  means 
between  each  of  the  explanatory  variables.  The  last  section 
investigates  the  discrepancies  associated  with  RACETH,  NCOE, 
and  PAYGD.  Through  a  presentation  of  graphics  demonstrating 
internal  shifts  of  those  variable  distributions,  an  effect 
which  appears  to  interrelate  the  three  distributional 
discrepancies  is  identified. 
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B.   COMPARISON  OF  MEANS  AND  VARIANCE 

The  tabulated  means  and  variances  of  the  study  variables 
for  the  top  three  percent  and  for  the  remainder  of  the  entire 
sample  are  presented  in  Table  XXII.  The  last  column  in  the 
table  shows  the  percentage  and  direction  that  the  TOP  data 
set  differed  from  the  SAMPLE. 


FABLE  XXII 

Top  vs 

Sample  S 

ummary  Data 

Variable 

/Type      T 

op  3% 

S 

ample 

Comment 

Promotion 

Mean 

Std  Dev 

Mean 

Std  Dev 

RATE 

2.06 

.392 

0.00 

1.00 

PRATE 

.  178 

,037 

.109 

.036 

PRA 

2.33 

.350 

0.00 

1.00 

Intelliq< 

Bnce 

AFQTP 

64.69 

22.01 

53.4 

20.9 

Top  17.5% 

> 

OAFQTP 

61.60 

23.24 

45.3 

24.7 

Top  26.4% 

> 

EIMCAT 

6.11 

1.31 

5.07 

1.28 

Top  17.0% 

> 

GTSCR 

113.17 

14.70 

108.3 

14.2 

Top  4.1% 

> 

HIYRED 

6.88 

1.59 

6.01 

1.07 

Top  12.6% 

> 

EDLVL 

7.  12 

1.55 

6.32 

.97 

Top  11.2% 

> 

PQSCR 

80.57 

11.31 

78.4 

1.6 

Top  2.6% 

> 

NCOE 

2.31 

2.50 

3.06 

2.81 

Top  33% 

< 

Effects 

SEX 

1.18 

.390 

1.12 

.328 

Top  5% 

> 

CMF 

62.09 

27.146 

51.9 

31.3 

Top  16% 

> 

RACETH 

1.58 

.975 

1.65 

.942 

Top  4% 

< 

PAYGD 

5.19 

.405 

5.27 

.464 

Top  3% 

< 

Observations  derived   from  the   data  in   Table  XXII  can  be 
summarized  as  follows: 

The  four  aptitude  test  variables,  GTSCR,  AFQTP  OAFQTP  and 
EIMCAT,  all  demonstrate  a  strong  positive  difference  between 
the  TOP  and  SAMPLE  scores.  The  AFQT  related  scores  are  about 
twenty  percent  greater,  with  GTSCR  greater  by  four  percent. 
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The  variables,  EDLVL  and  HIYRED,  were  both  positive,  with 
HIYRED  slightly  larger  at  twelve  percent,  PQSCR  increased 
slightly. 

The  effects  variables  SEX  and  CMP  both  increased,  with 
CMF  demonstrating  a  significant  increase.  The  change  in  CMF 
was  an  unexpected  result  of  subsetting  to  the  top  three 
percent.  The  PRA  variable  was  designed  to  be  independent  of 
CMF,  and  it  should  not  have  been  affected  as  significantly  as 
it  was . 

The  only  variables  which  decreased  in  proportion  between 
SAMPLE  and  TOP  were  NCOE,  RACETH,  and  PAYGD.  Of  the  three, 
NCOE  was  the  largest.  The  change  in  NCOE  was  also  an 
unexpected  result.  Regression  analysis  indicated  that  NCOE 
had  a  positive  influence  on  PRA.  To  have  NCOE  decrease  with 
top  performers  is  the  reverse  result.  Paragraph  D  of  this 
section  will  attempt  to  explain  the  reason  for  this  anomaly. 

C.   SIGNIFICANCE  TESTING 

Significance  testing  for  means  of  the  explanatory 
variables  between  the  TOP  and  SAMPLE  data  set  was  included  as 
a  formal  statistical  confirmation  of  differences  between  the 
two  data  sets.  Testing  using  nonparametric  methods  was 
utilized  since  the  study  variables  were  either  discrete,  or 
if  continuous,  did  not  meet  the  Kolmogorov-Smirnov  one-sample 
test  for  a  normal  distribution.  The  type  of  nonparametric 
test  used  is  dependent  on  the  type  scale  of  the  variable  and 
whether  it  was  continuous  or  discrete. 
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TABLE  XXIII   Top  vs  Sample 

Hypothesis  R 

esults 

Variabi 

e         Test  Used 

Resu 

Its 

Intell 

iqence 

GTSCR 

Kruskal-Wallis  Test  ^ 

Chisq 

" 

671 

Strongly 
reject  HO: 

AFQTP 

Kruskal-Wallis  Test 

Chisq 

= 

1165 

Strongly 
reject  HO: 

OAFQTP 

Kruskal-Wallis  Test 

Chisq 

= 

1418 

Strongly 
reject  HO: 

EIMCAT 

2XC  Contingency  Table* 

Chisq 

'- 

503 

Strongly 
reject  HO: 

HIYRED 

2XC  Contingency  Table 

Chisq 

- 

931 

Strongly 
reject  HO: 

EDLVL 

2XC  Contingency  Table 

Chisq 

~ 

700 

Strongly 
reject  HO: 

POSCR 

Kruskal-Wallis  Test 

Chisq 

- 

26.1 

Reject  HO: 

NCOE 

2  X  C  Contingency  Table 

Effects 

SEX 

2  *  C  Contingency  Table 

Chisq 

- 

CMF 

2  «  C  Contingency  Table 

Chisq 

" 

Strongly 
reject  HO: 

RACETH 

2  «  C  Contingency  Table 

Chisq 

= 

Reject  HO: 

PAYGD 

2  '    C  Contingency  Table 

Chisq 

" 

Strongly 
reject  HO: 

^  For  this  nonparametric  test  the  null  hypothesis  is  that 
the  populations  are  identical.  The  alternate  hypothesis  is 
that  one  of  the  populations  yields  larger  observations.  With 
two  populations  this  is  equivalent  to  a  Mann-Whitney  test. 
At  a  level  a  of  .95  the  critical  Chisquare  value  for 
rejection  is  Chisq  >  3.84. 

2For  this  nonparametric  test  the  null  hypothesis  is  that 
the  two  populations  have  the  same  distribution  as  measured  by 
the  probability  of  falling  into  one  of  the  discrete  variable 
classifications.  The  alternate  hypothesis  is  that  the 
distributions  are  different.  The  contingency  table  is  set 
for  the  two  rows  to  be  the  classification  of  PRA  >  1.93  and 
PRA  <  1.93,  the  C  represents  the  number  of  discrete  levels  in 
the  variable  being  tested.  The  Chisquare  test  statistic  is 
also  used  for  this  test  with  a  rejection  of  HO:  when  Chisq  is 
larger  than  3.84  at  a  .95  level  a. 
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Hypothesis   testing   confirms   the  observations   made  on 

simple   means   and   variances   of   the  study  variables.   The 

strength   of   the   difference   can   be  interpretated   by  the 
magnitude  of  the  Chi-square  statistic. 

D.   ANALYSIS  OF  DISTRIBUTIONS 

This  section  further  investigates  the  shifts  in 
distributions  for  those  variables  which  conflicted  with  the 
relationships  derived  in  regression  and  correlation  analysis. 
Those  variables  were  CMF,  NCOE  and  PAYGD .  Again,  the 
conflicts  which  arose  were  two-fold. 

First/  neither  CMF  or  PAYGD  should  have  been  affected  by 
subsetting  of  the  PRA  variable.  The  PRA  scores  are  normalized 
differences  from  the  average  score  for  every  paygrade  and  CMF 
combination.  Assuming  a  uniform  application  of  promotion 
policy  then,  no  one  CMF  or  paygrade  should  have  dominated  as 
a  result  of  subsetting  to  the  top  three  percent.  Secondly, 
NCOE  should  have  increased  slightly  rather  than  decreased 
significantly  by  subsetting  to  the  top  three  percent. 

The  three  inconsistencies  appear  to  be  linked  in  their 
distributional  change.  Observation  of  the  three  Figures  5.1, 
5.2,  and  5.3.  demonstrate  this. 
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TOP  VERSUS  SAMPLE  CMF  CHANGES  IN  PERCENT 
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Figure  5 . 1 
Figure  5.1  demonstrates  a  clearly  defined  redistribution  of 
CMF  percentages  away  from  combat  arms  MOS ' s  to  the  combat 
service  support  MOS ' s .  In  particular  Infantry,  Artillery, 
and  Armor  MOS ' s  lost  a  total  of  15.5  percent,  while  the 
Administrative  Specialists  (CMF  71)  gained  almost  9  percent. 
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Figure    5.2 


Figure  5.2  demonstrates  transfer  of  a  large  percentage  of 
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the  sample  density  away  from  the  NCOE  7  to  the  NCOE  0  level. 
This  was  consistent  with  the  observations  in  Figure  5.1, 
since  only  combat  arms  NCO's  qualify  for  level  7,  the  Combat 
Arms  Primary  Leadership  course. 


TOP  VS  SAMPLE  PAYGD 
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Figure  5 . 3 

The  last  figure.  Figure  5.3,  shows  a  displacement  of 
percentage  from  the  E-6  to  the  E-5  paygrade  as  a  result  of 
extracting  only  the  top  three  percent  by  measvire  of  promotion 
rate  . 

To  offer  an  explanation  of  the  underlying  reason  for 
these  discrepancies  is  difficult.  Some  measure  of  this 
discrepancy  may  well  be  explained  in  that  the  removal  of 
effects  by  normalizing  the  PRA  scores  was  not  entirely 
adequate.  The  observed  discrepancy  may  be  simple 
mathematical  error.  However,  it  can  be  noted  that  their 
interrelationships   do   act   consistently.   Specifically,  the 
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reduction  in  paygrade  and  combat  MOS's  both  combine  to 
significantly  reduce  the  NCOE  level.  As  such,  it  is  more 
likely  that  change  in  NCOE  occured  coincident  with  the 
changes  in  the  two  variables  PAYGD  and  CMF.  The  effect  being 
demonstrated  was  one  where  junior  combat  service  support 
NCO's  were  dominating  promotion  achievement. 

E.   SUMMARY  OF  FINDINGS 

Comparing  the  changes  in  averages  for  the  top  performers 
to  the  regression  coefficients  found  in  Chapter  IV,  shows 
very  substantial  agreement.  Specifically,  OAFQT  was  the  most 
significant  intelligence  test  variable,  while  HIYRED  was  the 
most  significant  academic  variable.  Although  the  percent 
change  in  OAFQT  is  greater  than  HIYRED,  it  still  has 
considerably  more  variance  than  HIYRED.  Thus,  the  predictive 
ability  of  HIYRED  in  regression  should  be  more  pronounced 
than  that  of  OAFQTP .  The  less  significant  variables  of 
PQSCR,  SEX,  and  RACETH  each  shifted  a  small,  significant 
amount  in  the  appropriate  direction. 

The  only  discrepancy  between  the  two  procedures  is  the 
change  in  the  variable  NCOE.  This  change  is  felt  to  have 
been  induced  by  changes  in  the  CMF  and  PAYGD  distributions. 
The  effect  is  one  where  junior  combat  service  support  NCO's 
replace  NCO's  from  the  combat  MOS's. 

An  important  observation  from  analysis  of  the  top  three 
percent  was  that  the  increase  in  the  value  of  any  explanatory 
variable  was   not  extreme.   In  fact,  the  largest  increase  was 
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only  twenty-five  percent.  As  an  inference,  it  appears  that 
NCO's  who  do  a  little  better  in  a  combination  of  areas, 
rather  than  much  better  in  a  single  area,  are  more  likely 
recipients  of  faster  promotion  rates. 
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VI.   PRINCIPAL  COMPONENTS  AND  FACTOR  ANALYSIS 

A.  INTRODUCTION 

In  this  chapter  more  advanced  statistical  procedures  are 
implemented  to  better  summarize  the  independent  variables, 
and  improve  or  at  least  simplify  the  cause-effect  model. 
Principal  components  and  factor  analysis  are  two  closely 
related  procedures  which  are  normally  used  in  investigating 
the  mutual  relationships  and  communalities  of  a  large  number 
of  variables.  By  identifying  redundant  variables,  and  by 
constructing  composite  variables  of  the  originals,  it  is 
possible  to  reduce  the  number  of  independent  explanatory 
variables  to  only  those  which  are  significant  and  unique. 

B.  THEORY 

Principal  components  and  factor  analysis  each  use  matrix 
algebra  to  operate  on  a  P  by  P  matrix  of  correlation  or 
covariance  coefficients  and  produce  a  system  of  eigenvectors 
of  the  form: 

Y< 3 )  =  ai J Xj  +323X2  +  ..apjXp  +  E.  In  the  notation,  Yj j i 
represents  the  resultant  composite  variable  which  is  the 
linear  combination  of  the  loading  coefficients,  at  3  .  These 
loading  coefficients  multiply  each  of  the  original  variables 
Xo ,  n=l..p.  E  represents  the  amount  of  residual  error  not 
accounted  by  the  linear  model. CRef.  5:p.  328]  The 
resulting    eigenvectors    represent    a   set   of   orthogonal 
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components  jointly  perpendicular  in  the  space  of  the  original 
variables.  [Ref.  15;p.  4243  These  components  are  jointly 
uncorrelated  and  individually  account  for  levels  of  variance, 
where  the  first  principal  component  accounts  for  the  largest 
proportion,  and  the  last  principal  component  accounts  for  the 
smallest.  A  resulting  component  may  be  representative  of 
some  aggregate  characteristic  of  the  original  input 
variables.  For  example  a  resulting  eigenvector  which  has 
strong  factor  loadings  for  original  variables  of  physical 
strength  and  endurance  could  be  called  a  factor  of  stamina  as 
an  aggregate  measure.  Principal  components  and  factor 
analysis  differ  in  that  principal  components  assume  and 
require  that  number  of  components  equal  to  the  number  of 
initial  variables  is  needed  to  account  for  the  total 
variance.  In  contrast,  the  factor  method  assumes  that  there 
exists  a  set  of  composites  in  a  dimension  smaller  than  the 
dimension  of  the  original  number  of  variables  which  will 
suffice. [Ref .   5:p.   622] 

An  additional  aspect  of  factor  analysis  is  that  it  allows 
for  rotation  of  the  solution  with  the  intent  of  developing 
more  unique  and  well-defined  components.  For  example  if 
there  are  five  variables  in  a  factor  which  have  intermediate 
loading  factors  in  the  range  .2  to  .4,  a  rotation  of  common 
factors  by  applying  nonsingular  linear  transformations  may 
result  in  a  pattern  matrix  in  which  the  loadings  are  either 
zero  or  close  to  one.   The  end  result  is   ea  Ler  to  interpret 
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than   the   factor   with  numerous   mixed  elements .   Graphical 

measures  are  useful  with  the  rotation  procedure  and  allow  the 

analyst   to    see   the  relative   uniqueness   of   the   input 
variables . 

C.   RESULTS 

The  SAS  procedure  for  performing  factor  analysis  was  used 
with  the  method  of  factor  determination  being  the  principal 
component  method.  As  such,  basic  principal  component 
analysis  was  conducted,  but  limits  were  applied  on  the  number 
of  factors  retained  so  that  only  the  most  significant 
composite  factors  would  be  kept.  The  first  set  of  input 
variables  included  all  of  the  twelve  study  variables.  Table 
XXIV  shows  the  resulting  factor  solution.  Appended  below 
each  component  is  an  interpretation  explaining  what  the 
aggregate  factors  represent.  The  original  input  variables 
which  contributed  most  to  the  factor  have  been  underlined. 
Following  Table  XXIII  is  a  factor  plot.  Figure  6.1,  where 
each  of  the  variables  is  coded  by  a  letter.  By  observing  the 
plot,  any  lack  of  uniqueness  for  a  group  of  variables  can  be 
noted  where  the  coded  letters  are  close  to  one  another. 
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TABLE  XXIV   Principal  Components  Tabular  Results 

Input  Matrix  of  correlation  coefficients 
PRIOR  COMMUNALITY  ESTIMATES:  ONE 

1        2       3       4       5        6     7 

EIGENVALUE  4.0052  1.7334  1.4979  1.0634  0.8496  0.8028  0.7542 

DIFFERENCE  2.2717  0.2355  0.4344  0.2138  0.0468  0.0486  0.2149 

PROPORTION  0.3338  0.1445  0.1248  0.0886  0.0708  0.0669  0.0628 

CUMULATIVE  0.3338  0.4782  0.6031  0.6910  0.7625  0.8294  0.8922 

8         9        10      11  12 

EIGENVALUE  0.5392   0.3500   0.2809  0.1196  0.0034 

DIFFERENCE  0.1892   0.0690   0.1613  0.1161 

PROPORTION  0.0449   0.0292   0.0234  0.0100  0.0003 

CUMULATIVE  0.9372   0.9663   0.9897  0.9997  1.0000 

7  FACTORS  WILL  BE  RETAINED  BY  THE  NFACTOR  CRITERION 

FACTOR  PATTERN 


FACTl 

FACT2 

FACT3 

FACT4 

FACTS 

FACT6 

FACT7 

EDLVL 

.4302 

.5861 

.5024 

-.2544 

-.0624 

-.0693 

-  .029 

AFQTP 

.9515 

-.1133 

-.1195 

.0637 

- .0075 

.1548 

-  .024 

EIMCAT 

.9060 

-.1220 

-. 1652 

-.0598 

-.0096 

.1478 

.011 

NCOE   - 

-.0085 

-.4507 

.6668 

.2527 

-.0398 

.0084 

-  .  134 

HIYRED 

.3834 

.6410 

.4176 

-.3281 

-.0637 

-.0830 

-  .124 

SEX 

.  1735 

.4212 

-.1113 

.6516 

.1857 

-.0736 

-.550 

OAFQT 

.9518 

-.1046 

-.1156 

.0590 

- .0092 

.  1535 

-  .023 

GTSCR 

.8238 

-.1128 

.0090 

.0331 

-.0464 

.1350 

.132 

PQSCR 

.4001 

-.2413 

.1205 

-.1150 

-.7312 

-.4527 

.115 

CMF 

.  1677 

.5200 

-.1449 

.4985 

-. 1171 

-.2587 

.561 

PAYGD 

.1216 

-.3467 

.6770 

.3367 

-.1816 

-.0495 

.151 

RACETH- 

-.3590 

.3130 

.2547 

.1229 

.4708 

.6507 

.216 

Intell 

Acad 

Career 

Sex 

PQSCR 

RACE 

CMF 

Tests 

Status 

FINAL  COMMUNALITY  ESTIMATES:  TOTAL  =   10.706622 
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PLOT 

OF  FACTOR  PATTERN  FOR 

FACTORl 

FACTORl  AND  FACTORS 

B 

1 

' 

C  G 

.9 

.H 

.7 

.6 

.5 

.4   I 

.3 

A 

E 

F 

JF 

.2 

A 

.1 

K 

C 

-.9- .8-.7- 

6-. 5-. 4-. 3-. 2-. 1 

0  .  1 

.2  .3  .4  .5  .6  D7  .8 

.9 

-.1 

T 

-.2 

0 

-  .3 

L 

R 

-.4 

3 

-.5 

-.6 

-.7 

-.8 

-.9 

■1 

EDLVL=A 

AFQTP=B  EIMCAT=C 

NC0E  =  1 

D  HIYRED=E  SEX=F 

OAFQT=G 

GTSCR=H  PQSCR=I 

CMF  =  J 

PAYGD=K   RACETH=L 

Figure  6 . 1 
The  results  appear  to  quite  reasonable,  where  the  most 
significant  factor  is  a  composite  of  all  the  mental  aptitude 
measures:  OAFQTP,  AFQTP  GTSCR,  and  EIMCAT.  The  second 
factor  consists  primarily  of  academic  performance  measures 
EDLVL  and  HIYRED.  The  third  factor  is  composed  of  NCOE  and 
PAYGD  and  reflects  two  closely  related  measures  dominated  by 
paygrade.  The  fourth  factor  is  predominantly  a  measure  of 
SEX  and  two  other  nominal  variables,  CMF  and  PAYGD.  The 
fifth,  sixth  and  seventh  factors  all  appear  to  be  dominated 
by  single  variables,  PQSCR,  RACE,  and  CMF  respectively. 
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In  short,  each  of  the  original  twelve  variables  is  in 
some  measure  represented  in  the  five  factors,  the  first  five 
factors  accounting  for  over  seventy  five  percent  of  the 
variance.  By  observing  the  entry  for  PROPORTION  one  can  see 
that  the  subsequent  seven  factors  each  contributed  between 
.0668  to  .0028  of  the  variance  and  as  such  are  not  major 
contributors . 

Using  the  results  of  the  first  solution  a  second  analysis 
was  conducted  with  a  reduced  number  of  input  variables.  In 
each  of  the  initial  solution  factors  the  single  variable 
having  the  largest  loading  factor  was  selected  and  the  other 
related  variables  were  eliminated.  Table  XXI  shows  the 
results  of  that  solution,  and  Figure  6.2  shows  the  Factor 
Plot. 
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TABLE  XXV    Reduced  Principal  Components  Tabular  Results 

PRIOR  COMMUNALITY  ESTIMATES:  ONE 
Input  Matrix  of  correlation  coefficients 


EIGENVALUE  2.1666  1.2063  1.0019  0.8703  0.8049  0.7081  0.2416 


DIFFERENCE  0.9602  0.2044 
PROPORTION  0.3095  0.1723 
CUMULATIVE  0.3095  0.4819 


0.1315  0.06540.09670.4665 

0.1431  0.1243  0.1150  0.10120.0345 

0.6250  0.7493  0.8643  0.96551.0000 


7  FACTORS  WILL  BE  RETAINED  BY  THE  NFACTOR  CRITERION 


FACTOR 

PATTERN 

FACTl    FACT2    FACT3    FACT4    FACT5    FACT6 

FACT7 

NCOE 

.0221 

-.5422 

,6941 

.2656 

-.3801  -.1071 

.018 

HIYRED 

.3659 

.5302 

.3135 

-.5162 

-.2443  -.4001 

-.004 

SEX 

.1803 

.6532 

.  1514 

.6993 

.0899  -.1346 

-.051 

OAFQT 

.8945 

.0404 

-.0412 

.0502 

-.0668   .2462 

-.328 

GTSCR 

.8592 

-.0374 

.0154 

-.0492 

-.1259   .3664 

-.328 

PQSCR 

.  5069 

-.3707 

.2537 

- .0613 

.7141  -.2648 

-.022 

RACETH 

-.4521 

.3275 

.5799 

-.1589 

.2487   .5031 

.037 

Intell 

Acad 

NCOE 

SEX 

PQSCR   Race 

Tests 

FINAL  COMMUNALITY  ESTIMATES:  TOTAL  = 


7.000000 


NCOE    HIYRED     SEX     NOAFQT    GTSCR    PQSCR    RACETH 
1.0000  1.0000   1.0000    1.0000    1.0000   1.0000   1.0000 
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PLOT  OF  FACTOR  PATTERN  FOR  FACTORl  AND  FACT0R2 

FACTORl 

1 
E.9D 

.8 

.7 

.6 
F         .5 

.4  B 

.3  F 

.2  C         A 

.1  C 

.9- .8-.7-.6A.5-.4-.3-.2-. 1   0  .1  .2  .3  .4  .5  .6  .7  .8  .9  T 

-.1  0 

-.2  R 

-.3  2 

-  .2 

-  .4     G 
-.5 

-.6 
-.7 

-  .8 
-.9 
-1 

NCOE=A   HIYRED=B  SEX=C   OAFQT-D   GTSCR-E   PQSCR^F  RACETH-G 


Figure  6.2  Factor  Plot 
Restricting  the  input  to  the  strongest  unique  variables 
results  in  an  almost  complete  separation  into  single  factors. 
The  only  exception  is  the  grouping  of  GTSCR  and  OAFQT,  (E  and 
D) .  This  is  not  suprising  considering  the  composition  of 
both  scores  from  the  same  set  of  tests  in  the  ASVAB.  Thus, 
the  decision  to  eliminate  GTSCR  from  earlier  regression 
models  makes  sense  from  the  Factor  Analysis  perspective  as 
well . 
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E.   SUMMARY  OF  FINDINGS 

The  application  of  principal  components  and  factor 
analysis  confirmed  many  of  the  patterns  of  dependency  and 
redundancy  with  the  study  variables.  It  confirmed  the 
choices  for  unique  variables  in  the  regression  as  developed 
in  Chapter  IV,  and  gave  a  good  second  opinion  for  deciding 
which  variables  could  be  set  aside  with  little  effect  on  the 
model . 
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VII.   CONCLUSION 

A.   OVERALL  FINDINGS 

There  is  strong  statistical  evidence  to  support  the 
proposition  that  success  in  the  Army,  as  measured  by 
promotion  rate,  is  related  to  the  individual's  intelligence 
test  scores  and  previous  academic  background.  The 
explanatory  variables  of  the  1980  normed  AFQT  score  and  the 
individual's  highest  year  of  education  at  time  of  entry  are 
the  most  important  indicators  for  a  future  promotion  rate. 
The  highest  year  of  education  at  time  of  entry  is  the  more 
important  measure,  but  changes  in  its  discrete  scale 
represents  very  substantial  changes  in  academic  background. 
OAFQT  is  not  nearly  as  important  as  HIYRED  and  can 
independently  affect  the  predicted  promotion  rate  only  up  to 
ten  percent. 

While  in  service,  how  well  the  individual  scores  on  his 
Performance  Qualification  Test  Scores  and  his  attendance  at 
NCO  schooling  will  be  indicative  of  a  faster  promotion  rate. 

The  statistical  evidence  for  these  observations  can  be 
argued  by  showing  the  existence  of  significantly  increasing 
promotion  rate  averages  across  ascending  levels  of 
explanatory  measures  in  ANOVA  and  ANCOVA  analysis.  This 
argument  can  be  supplemented,  and  those  differences  seen  more 
concretely,  by   a  simpler  comparison  of  top  performers  verses 
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the  sample  averages . 

Considerable  variance  of  promotion  rate  exists  across  any 
of  the  levels  of  the  discrete  explanatory  variables,  and 
within  any  of  the  categorical  variables.  There  is  a  dilemma 
in  designing  an  effective  dependent  variable.  While 
controlling  categorical  variables  such  as  CMF  and  Paygrade, 
the  effects  of  the  other  variables  become  more  apparent  and 
significant.  However,  the  ability  of  the  model  to  explain 
variance  is  significantly  diminished. 

Selecting  a  set  of  the  most  important  and  unique 
explanatory  variables  was  achieved  via  two  methods.  A 
successive,  increasing  dimension  procedure  distilled  a  set  of 
unique  explanatory  variables.  This  method  relied  upon 
developing  detailed  familiarity  with  each  variable.  In  the 
process  hypothesis  testing  was  used  to  eliminate 
insignificant  contributors  and  identify  the  most  important 
variable  from  a  group  of  related  variables.  This  restricted 
set  of  explanatory  variables  was  confirmed  with  the  use  of 
principal  components,  a  method  which  uses  a  mathematical 
approach  to  identify  orthogonal  and  unique  variables. 

When  using  inferential  procedures  the  resulting  model 
met  regression  assumptions,  both  parametrically  and 
nonparametrically .  Further,  the  model  estimates  are 
reproducable  with  an  alternate  data  set. 

Although  the  model  is  technically  acceptable,  it  is  only 
accurate   in    predicting   promotion   values   for   population 
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subcategories.  The  low  R2  value  and  high  mean  square  error 
terms  found  during  regression  were  manifested  in  model 
testing.  When  making  predictions  based  on  incremental 
changes  in  AFQT  the  sample  data  values  were  close,  but  upper 
and  lower  bounds  were  so  large  that  resulting  predictions 
were  not  usefull. 

The  poor  performance  of  the  predictive  model  can  be 
attributed  to  two  possible  reasons.  First,  that  there  exists 
some  unspecified  predictor  variable  which  could  be  used  to 
better  account  for  variance.  Or  secondly,  there  exists 
significant  inexplicable  chance  in  the  occurance  of  a 
promotion  rate  for  any  given  individual. 

In  the  case  of  the  first  reason,  it  should  be  observed 
that  the  number  of  available  entries  held  on  a  given 
individual  at  either  DMDC  or  MILPERCEN  is  limited.  Of  the 
one  hundred  and  forty  data  fields,  this  study  considered  all 
entries  which  were  felt  to  have  potential  merit  as  an 
explanatory  variable.  This  included  several  versions 
expressing  the  same  fundamental  quality.  Of  the  twelve 
variables  considered  the  final  number  of  significant 
variables  was  reduced  to  only  six.  Overall,  there  are  few 
significant  and  unique  measures  available  to  use  as 
predictors.  To  discover  additional  explanatory  variables 
would  require  establishment  of  new  personnel  data  elements  in 
those  data  bases.  Pot  ntial  candidates  include  evaluation 
report  averages,   or  p'   sibly,   the  results   of  a  personality 
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composite  test.  Alternatively,  the  quality  of  information  on 
academic  performance  could  be  increased,  such  as  the 
inclusion  of  grade  averages  from  high  school  attendance 
periods.  The  utility  of  this  additional  data  would  then  have 
to  be  evaluated  in  a  manner  similar  to  this  thesis. 

The  second  reason  given  for  error  is  a  more  probable 
explanation,  for  the  subject  matter  of  this  study  is  people, 
and  not  a  more  deterministic  physical  phenomenon.  The 
resolution  of  a  cause  effect  relationship  is  more  subtle  and 
more  difficult  to  verify.  Although  this  condition  does  not 
have  a  mathematical  remedy,  the  judgement  of  whether  or  not 
even  a  small,  highly  variable  measure  of  trend  is  sufficient 
still  lies  with  the  analyst  and  his  ability  to  present  that 
judgement  to  decision  makers. 

B.   POLICY  RECOMMENDATIONS 

The  first  question  that  must  be  answered  in  this  section 
is  whether  or  not  having  a  predictive  model  is  necessary  to 
make  policy  decisions  regarding  promotion  or  accession.  The 
answer  offered  in  this  document  is  that  it  is  not.  There  is 
sufficiently  reliable  information  resulting  from  hypothesis 
testing  and  subpopulation  analysis  to  make  cogent 
observations  and  decisions  with. 

From  the  results  of  this  investigation,  accession  policy 
makers  should  closely  manage  the  two  attributes  of  OAFQT  and 
HIYRED.  This  recommendation  is  more  a  confirmation,  rather 
than  a   proposal.   The  1984  Defense  Authorization  Act  already 
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places  constraints  on  AFQT  category  and  high  school  diploma 
status . 

The  two  in-service  attributes  that  should  be  managed  are 
the  Performance  Qualification  Score,  and  attendance  at  NCO 
schooling.  To  directly  tie  scores  on  these  attributes  in  the 
form  of  promotion  points  or  a  minimum  threshold  scale  would 
be  one  approach.  Unfortunately,  this  may  artificially  force 
NCO's  of  less  potential  and  aggressiveness  into  categories 
with  the  more  competent  individuals.  The  result  may  be  a 
lessening  of  the  discriminatory  effectiveness  of  the  two 
measures . 

If  the  individual  were  allowed  to  achieve  his  or  her 
score  and  pursue  in-service  education  independent  of 
promotion  policy,  the  ability  of  these  variables  to 
discriminate  would  be  better.  However,  not  tying  these 
scores  directly  to  promotion  points  values  or  thresholds 
should  not  mean  that  either  measure  would  be  unused.  A 
policy  where  promotion  boards  were  still  instructed  to  review 
an  individual's  scores,  inclusive  with  notification  of  this 
review  policy  to  the  NCO  population  allows  for  self  selection 
by  the  more  ambitious  individuals. 

C.   SUGGESTIONS  FOR  FURTHER  RESEARCH 

One  disturbing  observation  of  this  study  was  the  apparent 
disparity  among  race  and  ethnic  groups  in  terms  of  AFQT  and 
promotion  rates.  As  pointed  out  by  Daula  (1985)  the 
explanation  of   this  disparity  cannot  be  seen  in  an  aggregate 
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promotion  data  approach,  but  rather,  a  duration  model 
approach  with  a  set  group  of  individual  soldiers  over 
time.CRef.  11 : pp .  7-9]  His  paper  reports  that  this  disparity 
is  a  result  of  attrition.  Specifically,  the  shifting  of 
subcategory  promotion  averages  is  a  result  of  different 
retention  patterns  among  race  and  ethnic  groups,  and  not  due 
to  a  racialy  sensitive  promotion  system. 

A  study  to  determine  the  magnitude  and  underlying  reasons 
for  the  different  retention  patterns,  and  to  test  this 
hypothesis,  would  have  considerable  merit. 


125 


APPENDIX  A 


CAREER  MANAGEMENT  FIELDS  AND  FREQUENCIES 


CUMULATIVE  CUMULATIVE 


MOSNAME 

CMF 

FREQUENCY 

PERCENT 

FREQUENCY 

PERCENT 

Infantry 

11 

4320 

11.4 

4320 

11  .4 

Cbt  Engineer 

12 

1030 

2.7 

5350 

14.1 

Artillery 

13 

2780 

7.3 

8130 

21.5 

Air  Defense 

16 

851 

2.2 

8981 

23.7 

Special  Ops 

18 

244 

0.6 

9225 

24.4 

Armor 

19 

2434 

6.4 

11659 

30.8 

Hawk  Missile 

23 

187 

0.5 

11846 

31  .3 

Nike  Missile 

27 

352 

0.9 

12198 

32.2 

Tac  Radar 

28 

40 

0.1 

12238 

32.3 

Tac  Radar 

29 

625 

1  .7 

12863 

34.0 

Communication 

31 

3265 

8.6 

16128 

42.6 

Elect  Warfare 

33 

30 

0.1 

16158 

42.7 

Tech  Drafter 

51 

619 

1.6 

16777 

44.3 

Chem  Warfare 

54 

529 

1.4 

17306 

45.7 

Explosive  Ord 

55 

400 

1.1 

17706 

46.8 

Repair 

6  3 

3766 

9.9 

21472 

56.7 

Cargo  Spec 

64 

1041 

2.8 

22513 

59.5 

A/C  Repair 

67 

1090 

2.9 

23603 

62.4 

Admin  Spec 

71 

3020 

8.0 

26623 

70.3 

Programmer 

74 

423 

1.1 

27046 

71.4 

Supply 

76 

2677 

7.1 

29723 

78.5 

Recruiter 

79 

106 

0.3 

29829 

78.8 

Topo  Eng 

81 

65 

0.2 

29894 

79.0 

AV  Spec 

84 

157 

0.4 

30051 

79.4 

Medical 

91 

2498 

6.6 

32549 

86.0 

Lab  Spec 

92 

444 

1.2 

32993 

87.2 

Air  Traffic 

93 

175 

0.5 

33168 

87.6 

Food  SVC 

94 

919 

2.4 

34087 

90.0 

Mil  Police 

95 

1674 

4.4 

35761 

94.5 

Intelligence 

96 

789 

2.1 

36550 

96.6 

Musician 

97 

176 

0.5 

36726 

97.0 

EW/SIGINT 

98 

1125 

3.0 

37851 

100.0 
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APPENDIX  B 

AFQT  TRANSFORMATION  EQUIVALENT  SCORES 

Armed  Forces  Qualification  Test  (AFQT) 
Equivalent  Percentile  Scores  for  1944 
Mobilization  Population  and  1980  Youth  Population 

1944   1980  1944    1980 


1 

1 

2 

1 

3 

2 

4 

2 

5 

3 

6 

4 

7 

5 

8 

6 

9 

6 

10 

8 

11 

8 

12 

10 

13 

11 

14 

12 

15 

14 

16 

15 

17 

16 

18 

17 

19 

18 

20 

19 

21 

21 

22 

22 

23 

23 

24 

24 

25 

25 

26 

26 

27 

26 

28 

27 

29 

28 

30 

29 

31 

30 

32 

31 

33 

32 

34 

33 

35 

34 

36 

35 

37 

35 

38 

36 

39 

37 

40 

38 

41 

38 

42 

39 

43 

40 

44 

41 

45 

42 

46 

42 

47 

43 

48 

44 

49 

46 

50 

47 

51 

48 

52 

49 

53 

49 

54 

50 

55 

51 

56 

52 

57 

53 

58 

54 

59 

56 

60 

57 

61 

58 

62 

59 

63 

60 

64 

62 

65 

63 

66 

65 

1944 

980 

67 

66 

68 

67 

69 

68 

70 

69 

71 

70 

72 

71 

73 

72 

74 

73 

75 

74 

76 

75 

77 

76 

78 

77 

79 

78 

80 

79 

81 

80 

82 

81 

83 

83 

84 

84 

85 

85 

86 

87 

87 

89 

88 

91 

89 

92 

90 

93 

91 

94 

92 

95 

93 

95 

94 

97 

95 

98 

96 

98 

97 

99 

98 

99 

99 

99 
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